phenopacket_mapper package
A package to map data from a tabular format to the GA4GH Phenopacket schema (v2).
- class phenopacket_mapper.DataModel(name: str, fields: ~typing.Tuple[~phenopacket_mapper.data_standards.data_model.DataField | ~phenopacket_mapper.data_standards.data_model.DataSection | ~phenopacket_mapper.data_standards.data_model.OrGroup, ...], id: str = None, resources: ~typing.Tuple[~phenopacket_mapper.data_standards.code_system.CodeSystem, ...] = <factory>)[source]
Bases:
object
This class defines a data model for medical data using DataField
A data model can be used to import data and map it to the Phenopacket schema. It is made up of a list of DataField
Given that all DataField objects in a DataModel have unique names, the id field is generated from the name. E.g.: DataField(name=’Date of Birth’, …) will have an id of ‘date_of_birth’. The DataField objects can be accessed using the id as an attribute of the DataModel object. E.g.: data_model.date_of_birth. This is useful in the data reading and mapping processes.
- Variables:
name – Name of the data model
fields – List of DataField objects
resources – List of CodeSystem objects
- fields: Tuple[DataField | DataSection | OrGroup, ...]
- resources: Tuple[CodeSystem, ...]
- get_field(field_id: str, default: Optional = None) DataField | None [source]
Returns a DataField object by its id
- Parameters:
field_id – The id of the field
default – The default value to return if the field is not found
- Returns:
The DataField object
- load_data(path: str | Path, compliance: Literal['lenient', 'strict'] = 'lenient', **kwargs) DataSet [source]
Loads data from a file using a DataModel definition
To call this method, pass the column name for each field in the DataModel as a keyword argument. This is done by passing the field id followed by ‘_column’. E.g. if the DataModel has a field with id ‘date_of_birth’, the column name in the file should be passed as ‘date_of_birth_column’. The method will raise an error if any of the fields are missing.
E.g.:
`python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) data_model.load_data("data.csv", field_1_column="column_name_in_file") `
- Parameters:
path – Path to the file containing the data
compliance – Compliance level to use when loading the data.
kwargs – Dynamically passed parameters that match {id}_column for each item
- Returns:
A list of DataModelInstance objects
- class phenopacket_mapper.PhenopacketMapper(data_model: DataModel, **kwargs)[source]
Bases:
object
- map(data: DataSet) List[Phenopacket] [source]
Map data from the DataModel to Phenopackets
The mapping is based on the definition of the DataModel and the parameters passed to the constructor.
If successful, a list of Phenopackets will be returned
- Parameters:
data – List of DataModelInstances created from the data using the DataModel
- Returns:
List of Phenopackets