phenopacket_mapper.data_standards.data_model module
This module defines the DataModel class, which is used to define a data model for medical data. A DataModel is a collection of DataField objects, which define the fields of the data model. Each DataField has a name, a value set, a description, a section, a required flag, a specification, and an ordinal. The DataModel class also has a list of CodeSystem objects, which are used as resources in the data model.
The DataFieldValue class is used to define the value of a DataField in a DataModelInstance. The DataModelInstance class is used to define an instance of a DataModel, i.e. a record in a dataset.
- class phenopacket_mapper.data_standards.data_model.DataField(name: str, specification: ~phenopacket_mapper.data_standards.value_set.ValueSet | type | ~typing.List[type], id: str = None, required: bool = False, description: str = '', cardinality: ~phenopacket_mapper.data_standards.cardinality.Cardinality = <factory>)[source]
Bases:
DataNode
This class defines fields used in the definition of a DataModel
A data field is the equivalent of a column in a table. It has a name, a value set, a description, a section, a required flag, a specification, and an ordinal.
The string for the id field is generated from the name field using the str_to_valid_id function from the phenopacket_mapper.utils module. This attempts to convert the name field. Sometimes this might not work as desired, in which case the id field can be set manually.
Naming rules for the id field: - The id field must be a valid Python identifier - The id field must start with a letter or the underscore character - The id field must cannot start with a number - The id field can only contain lowercase alphanumeric characters and underscores (a-z, 0-9, and _ ) - The id field cannot be any of the Python keywords (e.g. in, is, not, class, etc.). - The id field must be unique within a DataModel
If the value_set is a single type, it can be passed directly as the value_set parameter.
- Variables:
name – Name of the field
specification – Value set of the field, if the value set is only one type, can also pass that type directly
id – The identifier of the field, adhering to the naming rules stated above
description – Description of the field
required – Required flag of the field
- cardinality: Cardinality
- class phenopacket_mapper.data_standards.data_model.DataSection(name: str, id: str = None, fields: ~typing.Tuple[~phenopacket_mapper.data_standards.data_model.DataField | ~phenopacket_mapper.data_standards.data_model.DataSection | ~phenopacket_mapper.data_standards.data_model.OrGroup, ...] = <factory>, required: bool = False, cardinality: ~phenopacket_mapper.data_standards.cardinality.Cardinality = <factory>)[source]
Bases:
object
This class defines a section in a DataModel
A section is a collection of DataField or DataSection objects. It is used to group related fields in a DataModel.
- Variables:
name – Name of the section
fields – List of DataField objects
- fields: Tuple[DataField | DataSection | OrGroup, ...]
- cardinality: Cardinality
- class phenopacket_mapper.data_standards.data_model.DataModel(name: str, fields: ~typing.Tuple[~phenopacket_mapper.data_standards.data_model.DataField | ~phenopacket_mapper.data_standards.data_model.DataSection | ~phenopacket_mapper.data_standards.data_model.OrGroup, ...], id: str = None, resources: ~typing.Tuple[~phenopacket_mapper.data_standards.code_system.CodeSystem, ...] = <factory>)[source]
Bases:
object
This class defines a data model for medical data using DataField
A data model can be used to import data and map it to the Phenopacket schema. It is made up of a list of DataField
Given that all DataField objects in a DataModel have unique names, the id field is generated from the name. E.g.: DataField(name=’Date of Birth’, …) will have an id of ‘date_of_birth’. The DataField objects can be accessed using the id as an attribute of the DataModel object. E.g.: data_model.date_of_birth. This is useful in the data reading and mapping processes.
- Variables:
name – Name of the data model
fields – List of DataField objects
resources – List of CodeSystem objects
- fields: Tuple[DataField | DataSection | OrGroup, ...]
- resources: Tuple[CodeSystem, ...]
- get_field(field_id: str, default: Optional = None) DataField | None [source]
Returns a DataField object by its id
- Parameters:
field_id – The id of the field
default – The default value to return if the field is not found
- Returns:
The DataField object
- load_data(path: str | Path, compliance: Literal['lenient', 'strict'] = 'lenient', **kwargs) DataSet [source]
Loads data from a file using a DataModel definition
To call this method, pass the column name for each field in the DataModel as a keyword argument. This is done by passing the field id followed by ‘_column’. E.g. if the DataModel has a field with id ‘date_of_birth’, the column name in the file should be passed as ‘date_of_birth_column’. The method will raise an error if any of the fields are missing.
E.g.:
`python data_model = DataModel("Test data model", [DataField(name="Field 1", value_set=ValueSet())]) data_model.load_data("data.csv", field_1_column="column_name_in_file") `
- Parameters:
path – Path to the file containing the data
compliance – Compliance level to use when loading the data.
kwargs – Dynamically passed parameters that match {id}_column for each item
- Returns:
A list of DataModelInstance objects
- class phenopacket_mapper.data_standards.data_model.DataFieldValue(id: str | int, field: DataField, value: int | float | str | bool | Date | CodeSystem)[source]
Bases:
object
This class defines the value of a DataField in a DataModelInstance
Equivalent to a cell value in a table.
- Variables:
id – The id of the value, i.e. the row number
field – DataField: The DataField to which this value belongs and which defines the value set for the field.
value – The value of the field.
- validate() bool [source]
Validates the data model instance based on data model definition
This method checks if the instance is valid based on the data model definition. It checks if all required fields are present, if the values are in the value set, etc.
- Returns:
True if the instance is valid, False otherwise
- class phenopacket_mapper.data_standards.data_model.DataSectionInstance(id: str | int, section: DataSection, values: Tuple[DataFieldValue | DataSectionInstance, ...])[source]
Bases:
object
- Variables:
id – The id of the instance, i.e. the row number
section – The DataSection object that defines the data model for this instance
values – A list of DataFieldValue objects, each adhering to the DataField definition in the DataModel
- section: DataSection
- values: Tuple[DataFieldValue | DataSectionInstance, ...]
- class phenopacket_mapper.data_standards.data_model.DataModelInstance(id: int | str, data_model: DataModel, values: Tuple[DataFieldValue | DataSectionInstance, ...], compliance: Literal['lenient', 'strict'] = 'lenient')[source]
Bases:
object
This class defines an instance of a DataModel, i.e. a record in a dataset
This class is used to define an instance of a DataModel, i.e. a record or row in a dataset.
- Variables:
id – The id of the instance, i.e. the row number
data_model – The DataModel object that defines the data model for this instance
values – A list of DataFieldValue objects, each adhering to the DataField definition in the DataModel
compliance – Compliance level to enforce when validating the instance. If ‘lenient’, the instance can have extra fields that are not in the DataModel. If ‘strict’, the instance must have all fields in the DataModel.
- values: Tuple[DataFieldValue | DataSectionInstance, ...]
- validate() bool [source]
Validates the data model instance based on data model definition
This method checks if the instance is valid based on the data model definition. It checks if all required fields are present, if the values are in the value set, etc.
- Returns:
True if the instance is valid, False otherwise
- class phenopacket_mapper.data_standards.data_model.DataSet(data_model: DataModel, data: List[DataModelInstance])[source]
Bases:
object
This class defines a dataset as defined by a DataModel
This class is used to define a dataset as defined by a DataModel. It is a collection of DataModelInstance objects.
- Variables:
data_model – The DataModel object that defines the data model for this dataset
data – A list of DataModelInstance objects, each adhering to the DataField definition in the DataModel
- data: List[DataModelInstance]
- property height
- property width
- preprocess(fields: str | DataField | List[str | DataField], mapping: Dict | Callable, **kwargs)[source]
Preprocesses a field in the dataset
Preprocessing happens in place, i.e. the values in the dataset are modified directly.
If fields is a list of fields, the mapping must be a method that can handle a list of values being passed as value to it. E.g.: ```python def preprocess_method(values, method, **kwargs): field1, field2 = values # do something with values return “preprocessed_values” + kwargs[“arg1”] + kwargs[“arg2”]
dataset.preprocess([“field_1”, “field_2”], preprocess_method, arg1=”value1”, arg2=”value2”) ```
- Parameters:
fields – Data fields to be preprocessed, will be passed onto mapping
mapping – A dictionary or method to use for preprocessing
- class phenopacket_mapper.data_standards.data_model.OrGroup(fields: Tuple[phenopacket_mapper.data_standards.data_model.DataField | phenopacket_mapper.data_standards.data_model.DataSection | ForwardRef('OrGroup'), ...], name: str = 'Or Group', id: str = None, description: str = '', required: bool = False, cardinality: phenopacket_mapper.data_standards.cardinality.Cardinality = Cardinality(min=0, max='n'))[source]
Bases:
DataNode
- fields: Tuple[DataField | DataSection | OrGroup, ...]
- cardinality: Cardinality
- phenopacket_mapper.data_standards.data_model.recursive_collect_all_members_data_model(data_model: DataModel | DataSection | OrGroup | DataField) Iterable[DataSection | OrGroup | DataField] [source]
Recursively collect all members of a DataModel, DataSection, OrGroup, or DataField
- Parameters:
data_model – DataModel, DataSection, OrGroup, or DataField to collect all members from
- Returns:
Iterable of DataSection, OrGroup, and DataField members