phenopacket_mapper.data_standards.data_model module

This module defines the DataModel class, which is used to define a data model for medical data. A DataModel is a collection of DataField objects, which define the fields of the data model. Each DataField has a name, a value set, a description, a section, a required flag, a specification, and an ordinal. The DataModel class also has a list of CodeSystem objects, which are used as resources in the data model.

The DataFieldValue class is used to define the value of a DataField in a DataModelInstance. The DataModelInstance class is used to define an instance of a DataModel, i.e. a record in a dataset.

class phenopacket_mapper.data_standards.data_model.DataField(name: str, specification: ~phenopacket_mapper.data_standards.value_set.ValueSet | type | ~typing.List[type], id: str = None, required: bool = False, description: str = '', cardinality: ~phenopacket_mapper.data_standards.cardinality.Cardinality = <factory>)[source]

Bases: DataNode

This class defines fields used in the definition of a DataModel

A data field is the equivalent of a column in a table. It has a name, a value set, a description, a section, a required flag, a specification, and an ordinal.

The string for the id field is generated from the name field using the str_to_valid_id function from the phenopacket_mapper.utils module. This attempts to convert the name field. Sometimes this might not work as desired, in which case the id field can be set manually.

Naming rules for the id field: - The id field must be a valid Python identifier - The id field must start with a letter or the underscore character - The id field must cannot start with a number - The id field can only contain lowercase alphanumeric characters and underscores (a-z, 0-9, and _ ) - The id field cannot be any of the Python keywords (e.g. in, is, not, class, etc.). - The id field must be unique within a DataModel

If the value_set is a single type, it can be passed directly as the value_set parameter.

Variables:
  • name – Name of the field

  • specification – Value set of the field, if the value set is only one type, can also pass that type directly

  • id – The identifier of the field, adhering to the naming rules stated above

  • description – Description of the field

  • required – Required flag of the field

name: str
specification: ValueSet | type | List[type]
id: str
required: bool
description: str
cardinality: Cardinality
class phenopacket_mapper.data_standards.data_model.DataSection(name: str, id: str = None, fields: ~typing.Tuple[~phenopacket_mapper.data_standards.data_model.DataField | ~phenopacket_mapper.data_standards.data_model.DataSection | ~phenopacket_mapper.data_standards.data_model.OrGroup, ...] = <factory>, required: bool = False, cardinality: ~phenopacket_mapper.data_standards.cardinality.Cardinality = <factory>)[source]

Bases: object

This class defines a section in a DataModel

A section is a collection of DataField or DataSection objects. It is used to group related fields in a DataModel.

Variables:
  • name – Name of the section

  • fields – List of DataField objects

name: str
id: str
fields: Tuple[DataField | DataSection | OrGroup, ...]
required: bool
cardinality: Cardinality
class phenopacket_mapper.data_standards.data_model.DataModel(name: str, fields: ~typing.Tuple[~phenopacket_mapper.data_standards.data_model.DataField | ~phenopacket_mapper.data_standards.data_model.DataSection | ~phenopacket_mapper.data_standards.data_model.OrGroup, ...], id: str = None, resources: ~typing.Tuple[~phenopacket_mapper.data_standards.code_system.CodeSystem, ...] = <factory>)[source]

Bases: object

This class defines a data model for medical data using DataField

A data model can be used to import data and map it to the Phenopacket schema. It is made up of a list of DataField

Given that all DataField objects in a DataModel have unique names, the id field is generated from the name. E.g.: DataField(name=’Date of Birth’, …) will have an id of ‘date_of_birth’. The DataField objects can be accessed using the id as an attribute of the DataModel object. E.g.: data_model.date_of_birth. This is useful in the data reading and mapping processes.

Variables:
  • name – Name of the data model

  • fields – List of DataField objects

  • resources – List of CodeSystem objects

name: str
fields: Tuple[DataField | DataSection | OrGroup, ...]
id: str
resources: Tuple[CodeSystem, ...]
property is_hierarchical: bool
get_field(field_id: str, default: Optional = None) DataField | None[source]

Returns a DataField object by its id

Parameters:
  • field_id – The id of the field

  • default – The default value to return if the field is not found

Returns:

The DataField object

get_field_ids() List[str][source]

Returns a list of the ids of the DataFields in the DataModel

load_data(path: str | Path, compliance: Literal['lenient', 'strict'] = 'lenient', **kwargs) DataSet[source]

Loads data from a file using a DataModel definition

To call this method, pass the column name for each field in the DataModel as a keyword argument. This is done by passing the field id followed by ‘_column’. E.g. if the DataModel has a field with id ‘date_of_birth’, the column name in the file should be passed as ‘date_of_birth_column’. The method will raise an error if any of the fields are missing.

E.g.: `python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) data_model.load_data("data.csv", field_1_column="column_name_in_file") `

Parameters:
  • path – Path to the file containing the data

  • compliance – Compliance level to use when loading the data.

  • kwargs – Dynamically passed parameters that match {id}_column for each item

Returns:

A list of DataModelInstance objects

class phenopacket_mapper.data_standards.data_model.DataFieldValue(id: str | int, field: DataField, value: int | float | str | bool | Date | CodeSystem)[source]

Bases: object

This class defines the value of a DataField in a DataModelInstance

Equivalent to a cell value in a table.

Variables:
  • id – The id of the value, i.e. the row number

  • field – DataField: The DataField to which this value belongs and which defines the value set for the field.

  • value – The value of the field.

id: str | int
field: DataField
value: int | float | str | bool | Date | CodeSystem
validate() bool[source]

Validates the data model instance based on data model definition

This method checks if the instance is valid based on the data model definition. It checks if all required fields are present, if the values are in the value set, etc.

Returns:

True if the instance is valid, False otherwise

class phenopacket_mapper.data_standards.data_model.DataSectionInstance(id: str | int, section: DataSection, values: Tuple[DataFieldValue | DataSectionInstance, ...])[source]

Bases: object

Variables:
  • id – The id of the instance, i.e. the row number

  • section – The DataSection object that defines the data model for this instance

  • values – A list of DataFieldValue objects, each adhering to the DataField definition in the DataModel

id: str | int
section: DataSection
values: Tuple[DataFieldValue | DataSectionInstance, ...]
validate() bool[source]
class phenopacket_mapper.data_standards.data_model.DataModelInstance(id: int | str, data_model: DataModel, values: Tuple[DataFieldValue | DataSectionInstance, ...], compliance: Literal['lenient', 'strict'] = 'lenient')[source]

Bases: object

This class defines an instance of a DataModel, i.e. a record in a dataset

This class is used to define an instance of a DataModel, i.e. a record or row in a dataset.

Variables:
  • id – The id of the instance, i.e. the row number

  • data_model – The DataModel object that defines the data model for this instance

  • values – A list of DataFieldValue objects, each adhering to the DataField definition in the DataModel

  • compliance – Compliance level to enforce when validating the instance. If ‘lenient’, the instance can have extra fields that are not in the DataModel. If ‘strict’, the instance must have all fields in the DataModel.

id: int | str
data_model: DataModel
values: Tuple[DataFieldValue | DataSectionInstance, ...]
compliance: Literal['lenient', 'strict']
validate() bool[source]

Validates the data model instance based on data model definition

This method checks if the instance is valid based on the data model definition. It checks if all required fields are present, if the values are in the value set, etc.

Returns:

True if the instance is valid, False otherwise

class phenopacket_mapper.data_standards.data_model.DataSet(data_model: DataModel, data: List[DataModelInstance])[source]

Bases: object

This class defines a dataset as defined by a DataModel

This class is used to define a dataset as defined by a DataModel. It is a collection of DataModelInstance objects.

Variables:
  • data_model – The DataModel object that defines the data model for this dataset

  • data – A list of DataModelInstance objects, each adhering to the DataField definition in the DataModel

data_model: DataModel
data: List[DataModelInstance]
property height
property width
property data_frame: DataFrame
preprocess(fields: str | DataField | List[str | DataField], mapping: Dict | Callable, **kwargs)[source]

Preprocesses a field in the dataset

Preprocessing happens in place, i.e. the values in the dataset are modified directly.

If fields is a list of fields, the mapping must be a method that can handle a list of values being passed as value to it. E.g.: ```python def preprocess_method(values, method, **kwargs): field1, field2 = values # do something with values return “preprocessed_values” + kwargs[“arg1”] + kwargs[“arg2”]

dataset.preprocess([“field_1”, “field_2”], preprocess_method, arg1=”value1”, arg2=”value2”) ```

Parameters:
  • fields – Data fields to be preprocessed, will be passed onto mapping

  • mapping – A dictionary or method to use for preprocessing

head(n: int = 5)[source]
class phenopacket_mapper.data_standards.data_model.OrGroup(fields: Tuple[phenopacket_mapper.data_standards.data_model.DataField | phenopacket_mapper.data_standards.data_model.DataSection | ForwardRef('OrGroup'), ...], name: str = 'Or Group', id: str = None, description: str = '', required: bool = False, cardinality: phenopacket_mapper.data_standards.cardinality.Cardinality = Cardinality(min=0, max='n'))[source]

Bases: DataNode

fields: Tuple[DataField | DataSection | OrGroup, ...]
name: str
id: str
description: str
required: bool
cardinality: Cardinality
phenopacket_mapper.data_standards.data_model.recursive_collect_all_members_data_model(data_model: DataModel | DataSection | OrGroup | DataField) Iterable[DataSection | OrGroup | DataField][source]

Recursively collect all members of a DataModel, DataSection, OrGroup, or DataField

Parameters:

data_model – DataModel, DataSection, OrGroup, or DataField to collect all members from

Returns:

Iterable of DataSection, OrGroup, and DataField members