phenopacket_mapper.pipeline.mapper module
- class phenopacket_mapper.pipeline.mapper.PhenopacketMapper(datamodel: DataModel)[source]
Bases:
object
Class to map data using a DataModel to Phenopackets
This class is central to the pipeline for mapping data from a DataModel to Phenopackets. A dataset can be mapped from its tabular format to the Phenopacket schema in a few simple steps: 1. Define the DataModel for the dataset, if it does not exist yet 2. Load the data from the dataset 3. Define the mapping from the DataModel to the Phenopacket schema 4. Perform the mapping 5. Write the Phenopackets to a file 6. Optionally validate the Phenopackets
- load_data(path: str | Path) List[DataModelInstance] [source]
Load data from a file using the DataModel
Will raise an error if the file type is not recognized or the file does not follow the DataModel
- Parameters:
path – Path to the file to load
- Returns:
List of DataModelInstances
- map(mapping_: DataModel2PhenopacketSchema, data: List[DataModelInstance]) List[Phenopacket] [source]
Map data from the DataModel to Phenopackets
The mapping is based on the definition of the DataModel and the DataModel2PhenopacketSchema mapping.
If successful, a list of Phenopackets will be returned
- Parameters:
mapping – Mapping from the DataModel to the Phenopacket schema, defined in DataModel2PhenopacketSchema
data – List of DataModelInstances created from the data using the DataModel
- Returns:
List of Phenopackets
- phenopacket_mapper.pipeline.mapper.mapping(path: Path, output: Path, validate_: bool, datamodel: DataModel = DataModel(data_model_name='ERDRI_CDS', fields=[], resources=[]))[source]
Executes the pipeline mapping a dataset in the format to the Phenopacket schema
- Parameters:
path – Path to formatted csv or excel file
output – Path to write Phenopackets to
validate – Validate phenopackets using phenopacket-tools after creation
datamodel – DataModel to use for the mapping, defaults to