phenopacket_mapper.utils.io.input module

phenopacket_mapper.utils.io.input.read_data_model(data_model_name: str, resources: Tuple[CodeSystem, ...], path: str | Path, file_type: Literal['csv', 'excel', 'unknown'] = 'unknown', column_names: Dict[str, str] = mappingproxy({'name': 'data_field_name', 'description': 'description', 'specification': 'value_set', 'required': 'required'}), parse_value_sets: bool = False, remove_line_breaks: bool = False, parse_ordinals: bool = True) DataModel[source]

Reads a Data Model from a file

Parameters:
  • data_model_name – Name to be given to the DataModel object

  • resources – List of CodeSystem objects to be used as resources in the DataModel

  • path – Path to Data Model file

  • file_type – Type of file to read, either ‘csv’ or ‘excel’

  • column_names – A dictionary mapping from each field of the DataField (key) class to a column of the file (value). Leaving a value empty (‘’) will leave the field in the DataModel definition empty.

  • parse_value_sets – If True, parses the string to a ValueSet object, can later be used to check validity of the data. Optional, but highly recommended.

  • remove_line_breaks – Whether to remove line breaks from string values

  • parse_ordinals – Whether to extract the ordinal number from the field name. Warning: this can overwrite values Ordinals could look like: “1.1.”, “1.”, “I.a.”, or “ii.”, etc.

phenopacket_mapper.utils.io.input.load_tabular_data_using_data_model(file: str | Path | IOBase | List[str] | List[Path] | List[IOBase], data_model: DataModel, column_names: Dict[str, str], compliance: Literal['lenient', 'strict'] = 'lenient') DataSet[source]

Loads data from a file using a DataModel definition

List a column for each field of the DataModel in the column_names dictionary. The keys of the dictionary should be {id}_column for each field and the values should be the name of the column in the file.

E.g.: `python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) column_names = {"field_1_column": "column_name_in_file"} load_data_using_data_model("data.csv", data_model, column_names) `

Parameters:
  • file

  • data_model – DataModel to use for reading the file

  • column_names – A dictionary mapping from the id of each field of the DataField to the name of a column in the file

  • compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.

Returns:

List of DataModelInstances

phenopacket_mapper.utils.io.input.read_phenopackets(dir_path: Path) List[Phenopacket][source]

Reads a list of Phenopackets from JSON files in a directory.

Parameters:

dir_path (Union[str, Path]) – The directory containing JSON files.

Returns:

The list of loaded Phenopackets.

Return type:

List[Phenopacket]

phenopacket_mapper.utils.io.input.read_phenopacket_from_json(path: str | Path) Phenopacket[source]

Reads a Phenopacket from a JSON file.

Parameters:

path (Union[str, Path]) – The path to the JSON file.

Returns:

The loaded Phenopacket.

Return type:

Phenopacket

phenopacket_mapper.utils.io.input.load_hierarchical_data_recursive(loaded_data_instance_identifier: int | str, loaded_data_instance: Dict, data_model: DataModel | DataSection | OrGroup | DataField, resources: Tuple[CodeSystem, ...], compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None) Tuple | DataModelInstance | DataSectionInstance | DataFieldValue | None[source]

Helper method for load_hierarchical_data, recurses through hierarchical DataModel

loaded_data_instance is expected to be a dictionary as returned by DataReader.data when reading a single xml or json file

Parameters:
  • loaded_data_instance_identifier – identifier of the loaded data_instance

  • loaded_data_instance – data loaded in by DataReader

  • data_model

  • resources – List of CodeSystem objects to be used as resources in the DataModel

  • compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.

  • mapping – specifies the mapping from data fields present in the data model to identifiers of fields in the data

phenopacket_mapper.utils.io.input.load_hierarchical_dataset(file: str | Path | List[str] | List[Path] | List[IOBase], data_model: DataModel, file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None, compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None)[source]
phenopacket_mapper.utils.io.input.load_hierarchical_data(file: str | Path | IOBase, data_model: DataModel, instance_identifier: int | str = None, file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None, compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None)[source]

Loads hierarchical single data from one hierarchical file using a DataModel definition

Parameters:
  • file – file to load data from

  • data_model – DataModel to use for reading the file

  • instance_identifier – identifier of the data instance

  • file_extension – file extension of the file

  • compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.

  • mapping – specifies the mapping from data fields present in the data model to ids of fields in the data