phenopacket_mapper.utils.parsing package

This module contains utility functions concerning the parsing of strings to python values

phenopacket_mapper.utils.parsing.parse_data_type(type_str: str, resources: List[CodeSystem], compliance: Literal['lenient', 'strict'] = 'lenient') List[Any | CodeSystem | type | str][source]

Parses a string representing of one or multiple data types or code systems to a list of type in Python

The purpose of this method is to parse entries in a Data Model tabular file for DataField.data_type. In the tabular file, the user can list typical primitive data types such as string, int, etc., or date as a data type. Further is it possible to list the name space prefix (e.g., “SCT” for SNOMED CT) of a specific resource (given its inclusion in the list passed as the resources parameter) to indicate that codes or terms from that resource are permittable.

When compliance is set to ‘lenient’ (default), this method only issues warnings if a data type is unrecognized and adds a literal to the list of allowed data types. When compliance is set to ‘strict’, it throws a ValueError in the case described above.

E.g. >>> parse_data_type(“integer, str, Boolean”, []) [<class ‘int’>, <class ‘str’>, <class ‘bool’>]

Parameters:
  • type_str

  • resources

  • compliance

Returns:

phenopacket_mapper.utils.parsing.parse_single_data_type(type_str: str, resources: List[CodeSystem], compliance: Literal['lenient', 'strict'] = 'lenient') Any | CodeSystem | type | str[source]

Parses a string representing a data type to the type in Python

E.g.: >>> parse_single_data_type(‘date’, []) <class ‘phenopacket_mapper.data_standards.date.Date’>

Parameters:
  • type_str

  • resources

  • compliance

Returns:

phenopacket_mapper.utils.parsing.parse_ordinal(field_name_str: str) Tuple[str, str][source]

Parsing DataField.name string to separate strings containing the ordinal and the name respectively

This method is meant as part of reading in a DataModel from a file, where data model fields might have an ordinal attached to them (e.g., “1.1. Pseudonym”), which this method can then neatly separate into ordinal=”1.1.” and name=”Pseudonym”.

>>> parse_ordinal("1.1. Pseudonym")
('1.1', 'Pseudonym')
>>> parse_ordinal("1. Pseudonym")
('1', 'Pseudonym')
>>> parse_ordinal("I.a. Pseudonym")
('I.a', 'Pseudonym')
>>> parse_ordinal("ii. Pseudonym")
('ii', 'Pseudonym')
Parameters:

field_name_str – name of the field, containing an ordinal, to parse

Returns:

a tuple containing the ordinal and the name

phenopacket_mapper.utils.parsing.parse_primitive_data_value(value_str: str) str | bool | int | float[source]

Parses an int, float, bool, or string from a string value.

Relies on the inbuilt Python type conversion functions to parse the value_str.

Parameters:

value_str – The string value to be parsed.

Returns:

The parsed value as an int, float, bool, or string.

phenopacket_mapper.utils.parsing.parse_int(int_str: str) int | None[source]

Parses an int from a string value.

If the string value cannot be parsed as an int, None is returned.

Parameters:

int_str – The string value to be parsed.

Returns:

The parsed value as an int. Or None if the string value cannot be parsed as an int.

phenopacket_mapper.utils.parsing.parse_float(float_str: str) float | None[source]

Parses a float from a string value.

If the string value cannot be parsed as a float, None is returned.

Parameters:

float_str – The string value to be parsed.

Returns:

The parsed value as a float. Or None if the string value cannot be parsed as a float.

phenopacket_mapper.utils.parsing.parse_bool(bool_str: str) bool | None[source]

Parses a boolean from a string value.

If the string value cannot be parsed as a boolean, None is returned.

On purpose does not parse 0 and 1 to False and True respectively, to avoid confusion with numeric values.

Parameters:

bool_str – The string value to be parsed.

Returns:

The parsed value as a boolean. Or None if the string value cannot be parsed as a boolean.

phenopacket_mapper.utils.parsing.parse_date(date_str: str, default_first: Literal['day', 'month'] = 'day', compliance: Literal['lenient', 'strict'] = 'lenient') Date | None[source]

Parse a date string into a Date object

There is a lot of variation in how dates are formatted, and this function attempts to handle as many of them as possible. The function will first attempt to parse the date string as an ISO 8601 formatted string. If that fails, it will attempt to parse the date string as a date string with separators.

In this process it is sometimes unknowable whether 01-02-2024 is January 2nd or February 1st, so the function will use the default_first parameter to determine this. If the default_first parameter is set to “day”, the function will assume that the day comes first, and if it is set to “month”, the function will assume that the month comes first. If the default_first

Parameters:
  • date_str – the date string to parse

  • default_first – the default unit to use if it is unclear which unit comes first between day and month

  • compliance – the compliance level of the parser

Returns:

the Date object created from the date string

phenopacket_mapper.utils.parsing.parse_coding(coding_str: str, resources: List[CodeSystem], compliance: Literal['lenient', 'strict'] = 'lenient') Coding[source]

Parsed a string representing a coding to a Coding object

Expected format: <namespace_prefix>:<code>

E.g.: >>> parse_coding(“SNOMED:404684003”, [code_system_module.SNOMED_CT]) Coding(system=CodeSystem(name=SNOMED CT, name space prefix=SNOMED, version=0.0.0), code=’404684003’, display=’’, text=’’)

Intended to be called with a list of all resources used.

Can only recognize the name space prefixes that belong to code systems provided in the resources list. If a name space is not found in the resources, it will return a Coding object with the system as the name space prefix and the code as the code.

E.g.: >>> parse_coding(“SNOMED:404684003”, []) Warning: Code system with namespace prefix ‘SNOMED’ not found in resources. Warning: Returning Coding object with system as namespace prefix and code as ‘404684003’ Coding(system=’SNOMED’, code=’404684003’, display=’’, text=’’)

Parameters:
  • coding_str – a string representing a coding

  • resources – a list of all resources used

  • compliance – whether to throw a ValueError or just a warning if a name space prefix is not found in the

resources :return: a Coding object as specified in the coding string

phenopacket_mapper.utils.parsing.parse_value(value_str: str, resources: Tuple[CodeSystem, ...], compliance: Literal['strict', 'lenient'] = 'lenient') Coding | CodeableConcept | CodeSystem | str | bool | int | float | Date | type[source]

Parses a string representing a value to the appropriate type

This method acts as a wrapper for the parsing of different types of values. It tries to parse the value from different types in the following order: 1. Primitive data value (parse_primitive_data_value) 2. Date (parse_date) 3. Coding (parse_coding) 4. String (if nothing else worked)

Parameters:
  • value_str – String representation of the value

  • resources – List of CodeSystems to use for parsing the value

  • compliance – Compliance level for parsing the value

Returns:

The parsed value

phenopacket_mapper.utils.parsing.get_codesystem_by_namespace_prefx(namespace_prefix_str: str, resources: List[CodeSystem]) CodeSystem | None[source]

Returns the CodeSystem object that matches the namespace prefix string. If no match is found, returns None.

Parameters:
  • namespace_prefix_str – The namespace prefix string to match

  • resources – The list of CodeSystem objects to search through

Returns:

The CodeSystem object that matches the namespace prefix string, or None if no match is found

phenopacket_mapper.utils.parsing.parse_value_set(value_set_str: str, value_set_name: str = '', value_set_description: str = '', resources: Tuple[CodeSystem, ...] = None, compliance: Literal['strict', 'lenient'] = 'lenient') ValueSet[source]

Parses a value set from a string representation

Parameters:
  • value_set_str – String representation of the value set

  • value_set_name – Name of the value set

  • value_set_description – Description of the value set

  • resources – List of CodeSystems to use for parsing the value set

  • compliance – Compliance level for parsing the value set

Returns:

A ValueSet object as defined by the string representation

Submodules