phenopacket_mapper.utils.io package
This module handles the input and output of data.
- phenopacket_mapper.utils.io.parse_xml(file: IOBase) Dict [source]
Parse an XML file into a dictionary with inferred types.
- class phenopacket_mapper.utils.io.DataReader(file: str | Path | IOBase | List[str] | List[Path] | List[IOBase], encoding: str = 'utf-8', file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None)[source]
Bases:
object
- phenopacket_mapper.utils.io.read_data_model(data_model_name: str, resources: Tuple[CodeSystem, ...], path: str | Path, file_type: Literal['csv', 'excel', 'unknown'] = 'unknown', column_names: Dict[str, str] = mappingproxy({'name': 'data_field_name', 'description': 'description', 'specification': 'value_set', 'required': 'required'}), parse_value_sets: bool = False, remove_line_breaks: bool = False, parse_ordinals: bool = True) DataModel [source]
Reads a Data Model from a file
- Parameters:
data_model_name – Name to be given to the DataModel object
resources – List of CodeSystem objects to be used as resources in the DataModel
path – Path to Data Model file
file_type – Type of file to read, either ‘csv’ or ‘excel’
column_names – A dictionary mapping from each field of the DataField (key) class to a column of the file (value). Leaving a value empty (‘’) will leave the field in the DataModel definition empty.
parse_value_sets – If True, parses the string to a ValueSet object, can later be used to check validity of the data. Optional, but highly recommended.
remove_line_breaks – Whether to remove line breaks from string values
parse_ordinals – Whether to extract the ordinal number from the field name. Warning: this can overwrite values Ordinals could look like: “1.1.”, “1.”, “I.a.”, or “ii.”, etc.
- phenopacket_mapper.utils.io.read_phenopackets(dir_path: Path) List[Phenopacket] [source]
Reads a list of Phenopackets from JSON files in a directory.
- Parameters:
dir_path (Union[str, Path]) – The directory containing JSON files.
- Returns:
The list of loaded Phenopackets.
- Return type:
List[Phenopacket]
- phenopacket_mapper.utils.io.read_phenopacket_from_json(path: str | Path) Phenopacket [source]
Reads a Phenopacket from a JSON file.
- Parameters:
path (Union[str, Path]) – The path to the JSON file.
- Returns:
The loaded Phenopacket.
- Return type:
Phenopacket
- phenopacket_mapper.utils.io.load_tabular_data_using_data_model(file: str | Path | IOBase | List[str] | List[Path] | List[IOBase], data_model: DataModel, column_names: Dict[str, str], compliance: Literal['lenient', 'strict'] = 'lenient') DataSet [source]
Loads data from a file using a DataModel definition
List a column for each field of the DataModel in the column_names dictionary. The keys of the dictionary should be {id}_column for each field and the values should be the name of the column in the file.
E.g.:
`python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) column_names = {"field_1_column": "column_name_in_file"} load_data_using_data_model("data.csv", data_model, column_names) `
- Parameters:
file
data_model – DataModel to use for reading the file
column_names – A dictionary mapping from the id of each field of the DataField to the name of a column in the file
compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.
- Returns:
List of DataModelInstances
- phenopacket_mapper.utils.io.load_hierarchical_data(file: str | Path | IOBase, data_model: DataModel, instance_identifier: int | str = None, file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None, compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None)[source]
Loads hierarchical single data from one hierarchical file using a DataModel definition
- Parameters:
file – file to load data from
data_model – DataModel to use for reading the file
instance_identifier – identifier of the data instance
file_extension – file extension of the file
compliance – Compliance level to enforce when reading the file. If ‘lenient’, the file can have extra fields that are not in the DataModel. If ‘strict’, the file must have all fields in the DataModel.
mapping – specifies the mapping from data fields present in the data model to ids of fields in the data
- phenopacket_mapper.utils.io.load_hierarchical_dataset(file: str | Path | List[str] | List[Path] | List[IOBase], data_model: DataModel, file_extension: Literal['csv', 'xlsx', 'json', 'xml'] = None, compliance: Literal['lenient', 'strict'] = 'lenient', mapping: Dict[DataField, str] = None)[source]
- phenopacket_mapper.utils.io.write(phenopackets_list: List[Phenopacket], out_dir: str | Path)[source]
Writes a list of phenopackets to JSON files.
- Parameters:
phenopackets_list – The list of phenopackets.
out_dir – The output directory.