phenopacket_mapper.pipeline package
This module includes the pipeline for mapping data to phenopackets.
- phenopacket_mapper.pipeline.read_file(path: str | Path, data_model: DataModel = DataModel(data_model_name='ERDRI_CDS', fields=[], resources=[]), file_type: Literal['csv', 'excel', 'unknown'] = 'unknown') List[DataModelInstance] [source]
Reads a csv file in using a DataModel definition and returns a list of DataModelInstances
- Parameters:
path – Path to formatted csv or excel file
file_type – Type of file to read, either ‘csv’ or ‘excel’
data_model – DataModel to use for reading the file
- Returns:
List of DataModelInstances
- phenopacket_mapper.pipeline.read_redcap_api(data_model: DataModel) List[DataModelInstance] [source]
Reads data from REDCap API and returns a list of DataModelInstances
- Parameters:
data_model – DataModel to use for reading the file
- Returns:
List of DataModelInstances
- phenopacket_mapper.pipeline.read_phenopackets(path: Path) List[Phenopacket] [source]
Reads Phenopackets from a file
- Parameters:
path – Path to Phenopackets file
- Returns:
List of Phenopackets
- phenopacket_mapper.pipeline.read_data_model(data_model_name: str, resources: List[CodeSystem], path: str | Path, file_type: Literal['csv', 'excel', 'unknown'] = 'unknown', column_names: Dict[str, str] = mappingproxy({'name': 'data_field_name', 'section': 'data_model_section', 'description': 'description', 'data_type': 'data_type', 'required': 'required', 'specification': 'specification', 'ordinal': 'ordinal'}), parse_data_types: bool = False, compliance: Literal['soft', 'hard'] = 'soft', remove_line_breaks: bool = False, parse_ordinals: bool = True) DataModel [source]
Reads a Data Model from a file
- Parameters:
data_model_name – Name to be given to the DataModel object
resources – List of CodeSystem objects to be used as resources in the DataModel
path – Path to Data Model file
file_type – Type of file to read, either ‘csv’ or ‘excel’
column_names – A dictionary mapping from each field of the DataField (key) class to a column of the file (value). Leaving a value empty (‘’) will leave the field in the DataModel definition empty.
parse_data_types – If True, parses the string to a list of CodeSystems and types, can later be used to check validity of the data. Optional, but highly recommended.
compliance – Only applicable if parse_data_types=True, otherwise does nothing. ‘soft’ raises warnings upon encountering invalid data types, ‘hard’ raises ValueError.
remove_line_breaks – Whether to remove line breaks from string values
parse_ordinals – Whether to extract the ordinal number from the field name. Warning: this can overwrite values Ordinals could look like: “1.1.”, “1.”, “I.a.”, or “ii.”, etc.
- phenopacket_mapper.pipeline.write(phenopackets: List[Phenopacket], output_path: str | Path)[source]
Write a list of phenopackets to a file :param phenopackets: List of phenopackets :param output_path: the path to write the phenopackets to
- class phenopacket_mapper.pipeline.PhenopacketMapper(datamodel: DataModel)[source]
Bases:
object
Class to map data using a DataModel to Phenopackets
This class is central to the pipeline for mapping data from a DataModel to Phenopackets. A dataset can be mapped from its tabular format to the Phenopacket schema in a few simple steps: 1. Define the DataModel for the dataset, if it does not exist yet 2. Load the data from the dataset 3. Define the mapping from the DataModel to the Phenopacket schema 4. Perform the mapping 5. Write the Phenopackets to a file 6. Optionally validate the Phenopackets
- load_data(path: str | Path) List[DataModelInstance] [source]
Load data from a file using the DataModel
Will raise an error if the file type is not recognized or the file does not follow the DataModel
- Parameters:
path – Path to the file to load
- Returns:
List of DataModelInstances
- map(mapping_: DataModel2PhenopacketSchema, data: List[DataModelInstance]) List[Phenopacket] [source]
Map data from the DataModel to Phenopackets
The mapping is based on the definition of the DataModel and the DataModel2PhenopacketSchema mapping.
If successful, a list of Phenopackets will be returned
- Parameters:
mapping – Mapping from the DataModel to the Phenopacket schema, defined in DataModel2PhenopacketSchema
data – List of DataModelInstances created from the data using the DataModel
- Returns:
List of Phenopackets