phenopacket_mapper.utils.io package

This module handles the input and output of data.

phenopacket_mapper.utils.io.read_json(path: str | Path | IOBase) Dict[source]
phenopacket_mapper.utils.io.read_xml(path: str | Path | IOBase, encoding='utf-8') Dict[source]
phenopacket_mapper.utils.io.parse_xml(file: IOBase) Dict[source]

Parse an XML file into a dictionary with inferred types.

class phenopacket_mapper.utils.io.DataReader(file: str | Path | IOBase | List[str] | List[Path] | List[IOBase], encoding: str = 'utf-8', file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None)[source]

Bases: object

handle_file_extension(fe: str)[source]
phenopacket_mapper.utils.io.read_data_model(data_model_name: str, resources: Tuple[CodeSystem, ...], path: str | Path, file_type: Literal['csv', 'excel', 'unknown'] = 'unknown', column_names: Dict[str, str] = mappingproxy({'name': 'data_field_name', 'description': 'description', 'specification': 'value_set', 'required': 'required'}), parse_value_sets: bool = False, remove_line_breaks: bool = False, parse_ordinals: bool = True) DataModel[source]

Reads a Data Model from a file

Parameters:
  • data_model_name – Name to be given to the DataModel object

  • resources – List of CodeSystem objects to be used as resources in the DataModel

  • path – Path to Data Model file

  • file_type – Type of file to read, either ‘csv’ or ‘excel’

  • column_names – A dictionary mapping from each field of the DataField (key) class to a column of the file (value). Leaving a value empty (‘’) will leave the field in the DataModel definition empty.

  • parse_value_sets – If True, parses the string to a ValueSet object, can later be used to check validity of the data. Optional, but highly recommended.

  • remove_line_breaks – Whether to remove line breaks from string values

  • parse_ordinals – Whether to extract the ordinal number from the field name. Warning: this can overwrite values Ordinals could look like: “1.1.”, “1.”, “I.a.”, or “ii.”, etc.

phenopacket_mapper.utils.io.read_phenopackets(dir_path: Path) List[Phenopacket][source]

Reads a list of Phenopackets from JSON files in a directory.

Parameters:

dir_path (Union[str, Path]) – The directory containing JSON files.

Returns:

The list of loaded Phenopackets.

Return type:

List[Phenopacket]

phenopacket_mapper.utils.io.read_phenopacket_from_json(path: str | Path) Phenopacket[source]

Reads a Phenopacket from a JSON file.

Parameters:

path (Union[str, Path]) – The path to the JSON file.

Returns:

The loaded Phenopacket.

Return type:

Phenopacket

phenopacket_mapper.utils.io.load_tabular_data_using_data_model(file: str | Path | IOBase | List[str] | List[Path] | List[IOBase], data_model: DataModel, column_names: Dict[str, str], compliance: Literal['lenient', 'strict'] = 'lenient') DataSet[source]

Loads data from a file using a DataModel definition

List a column for each field of the DataModel in the column_names dictionary. The keys of the dictionary should be {id}_column for each field and the values should be the name of the column in the file.

E.g.: `python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) column_names = {"field_1_column": "column_name_in_file"} load_data_using_data_model("data.csv", data_model, column_names) `

Parameters:
  • file

  • data_model – DataModel to use for reading the file

  • column_names – A dictionary mapping from the id of each field of the DataField to the name of a column in the file

  • compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.

Returns:

List of DataModelInstances

phenopacket_mapper.utils.io.load_hierarchical_data(file: str | Path | IOBase, data_model: DataModel, instance_identifier: int | str = None, file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None, compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None)[source]

Loads hierarchical single data from one hierarchical file using a DataModel definition

Parameters:
  • file – file to load data from

  • data_model – DataModel to use for reading the file

  • instance_identifier – identifier of the data instance

  • file_extension – file extension of the file

  • compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.

  • mapping – specifies the mapping from data fields present in the data model to ids of fields in the data

phenopacket_mapper.utils.io.load_hierarchical_dataset(file: str | Path | List[str] | List[Path] | List[IOBase], data_model: DataModel, file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None, compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None)[source]
phenopacket_mapper.utils.io.write(phenopackets_list: List[Phenopacket], out_dir: str | Path)[source]

Writes a list of phenopackets to JSON files.

Parameters:
  • phenopackets_list – The list of phenopackets.

  • out_dir – The output directory.

Submodules