phenopacket_mapper.utils.parsing package
This module contains utility functions concerning the parsing of strings to python values
- phenopacket_mapper.utils.parsing.parse_data_type(type_str: str, resources: List[CodeSystem], compliance: Literal['lenient', 'strict'] = 'lenient') List[Any | CodeSystem | type | str] [source]
Parses a string representing of one or multiple data types or code systems to a list of type in Python
The purpose of this method is to parse entries in a Data Model tabular file for DataField.data_type. In the tabular file, the user can list typical primitive data types such as string, int, etc., or date as a data type. Further is it possible to list the name space prefix (e.g., “SCT” for SNOMED CT) of a specific resource (given its inclusion in the list passed as the resources parameter) to indicate that codes or terms from that resource are permittable.
When compliance is set to ‘lenient’ (default), this method only issues warnings if a data type is unrecognized and adds a literal to the list of allowed data types. When compliance is set to ‘strict’, it throws a ValueError in the case described above.
E.g. >>> parse_data_type(“integer, str, Boolean”, []) [<class ‘int’>, <class ‘str’>, <class ‘bool’>]
- Parameters:
type_str
resources
compliance
- Returns:
- phenopacket_mapper.utils.parsing.parse_single_data_type(type_str: str, resources: List[CodeSystem], compliance: Literal['lenient', 'strict'] = 'lenient') Any | CodeSystem | type | str [source]
Parses a string representing a data type to the type in Python
E.g.: >>> parse_single_data_type(‘date’, []) <class ‘phenopacket_mapper.data_standards.date.Date’>
- Parameters:
type_str
resources
compliance
- Returns:
- phenopacket_mapper.utils.parsing.parse_ordinal(field_name_str: str) Tuple[str, str] [source]
Parsing DataField.name string to separate strings containing the ordinal and the name respectively
This method is meant as part of reading in a DataModel from a file, where data model fields might have an ordinal attached to them (e.g., “1.1. Pseudonym”), which this method can then neatly separate into ordinal=”1.1.” and name=”Pseudonym”.
>>> parse_ordinal("1.1. Pseudonym") ('1.1', 'Pseudonym')
>>> parse_ordinal("1. Pseudonym") ('1', 'Pseudonym')
>>> parse_ordinal("I.a. Pseudonym") ('I.a', 'Pseudonym')
>>> parse_ordinal("ii. Pseudonym") ('ii', 'Pseudonym')
- Parameters:
field_name_str – name of the field, containing an ordinal, to parse
- Returns:
a tuple containing the ordinal and the name
- phenopacket_mapper.utils.parsing.parse_primitive_data_value(value_str: str) str | bool | int | float [source]
Parses an int, float, bool, or string from a string value.
Relies on the inbuilt Python type conversion functions to parse the value_str.
- Parameters:
value_str – The string value to be parsed.
- Returns:
The parsed value as an int, float, bool, or string.
- phenopacket_mapper.utils.parsing.parse_int(int_str: str) int | None [source]
Parses an int from a string value.
If the string value cannot be parsed as an int, None is returned.
- Parameters:
int_str – The string value to be parsed.
- Returns:
The parsed value as an int. Or None if the string value cannot be parsed as an int.
- phenopacket_mapper.utils.parsing.parse_float(float_str: str) float | None [source]
Parses a float from a string value.
If the string value cannot be parsed as a float, None is returned.
- Parameters:
float_str – The string value to be parsed.
- Returns:
The parsed value as a float. Or None if the string value cannot be parsed as a float.
- phenopacket_mapper.utils.parsing.parse_bool(bool_str: str) bool | None [source]
Parses a boolean from a string value.
If the string value cannot be parsed as a boolean, None is returned.
On purpose does not parse 0 and 1 to False and True respectively, to avoid confusion with numeric values.
- Parameters:
bool_str – The string value to be parsed.
- Returns:
The parsed value as a boolean. Or None if the string value cannot be parsed as a boolean.
- phenopacket_mapper.utils.parsing.parse_date(date_str: str, default_first: Literal['day', 'month'] = 'day', compliance: Literal['lenient', 'strict'] = 'lenient') Date | None [source]
Parse a date string into a Date object
There is a lot of variation in how dates are formatted, and this function attempts to handle as many of them as possible. The function will first attempt to parse the date string as an ISO 8601 formatted string. If that fails, it will attempt to parse the date string as a date string with separators.
In this process it is sometimes unknowable whether 01-02-2024 is January 2nd or February 1st, so the function will use the default_first parameter to determine this. If the default_first parameter is set to “day”, the function will assume that the day comes first, and if it is set to “month”, the function will assume that the month comes first. If the default_first
- Parameters:
date_str – the date string to parse
default_first – the default unit to use if it is unclear which unit comes first between day and month
compliance – the compliance level of the parser
- Returns:
the Date object created from the date string
- phenopacket_mapper.utils.parsing.parse_coding(coding_str: str, resources: List[CodeSystem], compliance: Literal['lenient', 'strict'] = 'lenient') Coding [source]
Parsed a string representing a coding to a Coding object
Expected format: <namespace_prefix>:<code>
E.g.: >>> parse_coding(“SNOMED:404684003”, [code_system_module.SNOMED_CT]) Coding(system=CodeSystem(name=SNOMED CT, name space prefix=SNOMED, version=0.0.0), code=’404684003’, display=’’, text=’’)
Intended to be called with a list of all resources used.
Can only recognize the name space prefixes that belong to code systems provided in the resources list. If a name space is not found in the resources, it will return a Coding object with the system as the name space prefix and the code as the code.
E.g.: >>> parse_coding(“SNOMED:404684003”, []) Warning: Code system with namespace prefix ‘SNOMED’ not found in resources. Warning: Returning Coding object with system as namespace prefix and code as ‘404684003’ Coding(system=’SNOMED’, code=’404684003’, display=’’, text=’’)
- Parameters:
coding_str – a string representing a coding
resources – a list of all resources used
compliance – whether to throw a ValueError or just a warning if a name space prefix is not found in the
resources :return: a Coding object as specified in the coding string
- phenopacket_mapper.utils.parsing.parse_value(value_str: str, resources: Tuple[CodeSystem, ...], compliance: Literal['strict', 'lenient'] = 'lenient') Coding | CodeableConcept | CodeSystem | str | bool | int | float | Date | type [source]
Parses a string representing a value to the appropriate type
This method acts as a wrapper for the parsing of different types of values. It tries to parse the value from different types in the following order: 1. Primitive data value (parse_primitive_data_value) 2. Date (parse_date) 3. Coding (parse_coding) 4. String (if nothing else worked)
- Parameters:
value_str – String representation of the value
resources – List of CodeSystems to use for parsing the value
compliance – Compliance level for parsing the value
- Returns:
The parsed value
- phenopacket_mapper.utils.parsing.get_codesystem_by_namespace_prefx(namespace_prefix_str: str, resources: List[CodeSystem]) CodeSystem | None [source]
Returns the CodeSystem object that matches the namespace prefix string. If no match is found, returns None.
- Parameters:
namespace_prefix_str – The namespace prefix string to match
resources – The list of CodeSystem objects to search through
- Returns:
The CodeSystem object that matches the namespace prefix string, or None if no match is found
- phenopacket_mapper.utils.parsing.parse_value_set(value_set_str: str, value_set_name: str = '', value_set_description: str = '', resources: Tuple[CodeSystem, ...] = None, compliance: Literal['strict', 'lenient'] = 'lenient') ValueSet [source]
Parses a value set from a string representation
- Parameters:
value_set_str – String representation of the value set
value_set_name – Name of the value set
value_set_description – Description of the value set
resources – List of CodeSystems to use for parsing the value set
compliance – Compliance level for parsing the value set
- Returns:
A ValueSet object as defined by the string representation