phenopacket_mapper.pipeline package

This module includes the pipeline for mapping data to phenopackets.

phenopacket_mapper.pipeline.read_file(path: str | Path, data_model: DataModel = DataModel(data_model_name='ERDRI_CDS', fields=[], resources=[]), file_type: Literal['csv', 'excel', 'unknown'] = 'unknown') List[DataModelInstance][source]

Reads a csv file in using a DataModel definition and returns a list of DataModelInstances

Parameters:
  • path – Path to formatted csv or excel file

  • file_type – Type of file to read, either ‘csv’ or ‘excel’

  • data_model – DataModel to use for reading the file

Returns:

List of DataModelInstances

phenopacket_mapper.pipeline.read_redcap_api(data_model: DataModel) List[DataModelInstance][source]

Reads data from REDCap API and returns a list of DataModelInstances

Parameters:

data_model – DataModel to use for reading the file

Returns:

List of DataModelInstances

phenopacket_mapper.pipeline.read_phenopackets(path: Path) List[Phenopacket][source]

Reads Phenopackets from a file

Parameters:

path – Path to Phenopackets file

Returns:

List of Phenopackets

phenopacket_mapper.pipeline.read_data_model(data_model_name: str, resources: List[CodeSystem], path: str | Path, file_type: Literal['csv', 'excel', 'unknown'] = 'unknown', column_names: Dict[str, str] = mappingproxy({'name': 'data_field_name', 'section': 'data_model_section', 'description': 'description', 'data_type': 'data_type', 'required': 'required', 'specification': 'specification', 'ordinal': 'ordinal'}), parse_data_types: bool = False, compliance: Literal['soft', 'hard'] = 'soft', remove_line_breaks: bool = False, parse_ordinals: bool = True) DataModel[source]

Reads a Data Model from a file

Parameters:
  • data_model_name – Name to be given to the DataModel object

  • resources – List of CodeSystem objects to be used as resources in the DataModel

  • path – Path to Data Model file

  • file_type – Type of file to read, either ‘csv’ or ‘excel’

  • column_names – A dictionary mapping from each field of the DataField (key) class to a column of the file (value). Leaving a value empty (‘’) will leave the field in the DataModel definition empty.

  • parse_data_types – If True, parses the string to a list of CodeSystems and types, can later be used to check validity of the data. Optional, but highly recommended.

  • compliance – Only applicable if parse_data_types=True, otherwise does nothing. ‘soft’ raises warnings upon encountering invalid data types, ‘hard’ raises ValueError.

  • remove_line_breaks – Whether to remove line breaks from string values

  • parse_ordinals – Whether to extract the ordinal number from the field name. Warning: this can overwrite values Ordinals could look like: “1.1.”, “1.”, “I.a.”, or “ii.”, etc.

phenopacket_mapper.pipeline.write(phenopackets: List[Phenopacket], output_path: str | Path)[source]

Write a list of phenopackets to a file :param phenopackets: List of phenopackets :param output_path: the path to write the phenopackets to

class phenopacket_mapper.pipeline.PhenopacketMapper(datamodel: DataModel)[source]

Bases: object

Class to map data using a DataModel to Phenopackets

This class is central to the pipeline for mapping data from a DataModel to Phenopackets. A dataset can be mapped from its tabular format to the Phenopacket schema in a few simple steps: 1. Define the DataModel for the dataset, if it does not exist yet 2. Load the data from the dataset 3. Define the mapping from the DataModel to the Phenopacket schema 4. Perform the mapping 5. Write the Phenopackets to a file 6. Optionally validate the Phenopackets

load_data(path: str | Path) List[DataModelInstance][source]

Load data from a file using the DataModel

Will raise an error if the file type is not recognized or the file does not follow the DataModel

Parameters:

path – Path to the file to load

Returns:

List of DataModelInstances

map(mapping_: DataModel2PhenopacketSchema, data: List[DataModelInstance]) List[Phenopacket][source]

Map data from the DataModel to Phenopackets

The mapping is based on the definition of the DataModel and the DataModel2PhenopacketSchema mapping.

If successful, a list of Phenopackets will be returned

Parameters:
  • mapping – Mapping from the DataModel to the Phenopacket schema, defined in DataModel2PhenopacketSchema

  • data – List of DataModelInstances created from the data using the DataModel

Returns:

List of Phenopackets

write(phenopackets: List[Phenopacket], output_path: str | Path) bool[source]

Write Phenopackets to a file

Parameters:
  • phenopackets – List of Phenopackets to write

  • output_path – Path to write the Phenopackets to

Returns:

True if successful, False otherwise

Submodules