pywhip.pywhip.
whip_csv
(csv_file, specifications, delimiter, maxentries=None)[source]¶Whip a CSV-like file
Validate a CSV file, using the CSV
reading and iterator capabilities of the Python standard library.
Filename of the CSV file to whip validate.
Valid specifications whip dictionary schema.
A one-character string used to separate fields, e.g. ','
.
Define the limit of records to validate from the Archive, useful to have a quick set on the frst subset of data.
Whip validator class instance, containing the errors and reporting capabilities.
pywhip.pywhip.
whip_dwca
(dwca_zip, specifications, maxentries=None)[source]¶Whip a Darwin Core Archive
Validate the core file of a Darwin Core Archive zipped data set,
using the DwCAReader
reading and iterator capabilities.
Filename of the zipped Darwin Core Archive.
Valid specifications whip dictionary schema.
Define the limit of records to validate from the Archive, useful to have a quick set on the frst subset of data.
Whip validator clasc instance, containing the errors and reporting capabilities.
pywhip.pywhip.
Whip
(schema, sample_size=10)[source]¶Whip document validation class
Validates (multiple row) documents against a whip specification schema
using the high-level functions whip_...
and creates a validation report
with the get_report()
method.
Number of value-examples to use in reporting
Whip specification schema, consisting of field : constraint combinations
A DwcaValidator
class instance.
Base report container to collect document errors. Errors are
collected in the [‘results’][‘specified_fields’] values, having
a SpecificationErrorHandler
for each
field-specification combination.
Whip specification schema, consisting of field : constraint combinations.
For each of the field-rules combinations, the (top) number of data value samples/examples to include in the report.
The DwcaValidator
is the underlying engine to handle
the validation of incoming values against the whip specifications. It
extends the existing Cerberus Validator
class.
pywhip.validators.
DwcaValidator
(*args, **kwargs)[source]¶Validates any mapping against specifications defined in a validation-schema
In the context of pywhip, a mapping is generally a single line of data, with the keys the fields (data headers) and the values the data values for that particular line.
Notes
This class subclasses Validator
and adds pywhip specific
_validate_<specification>
methods.
The whip specifications are a combination of cerberus native specifications and pywhip custom ones:
minlength, maxlength, regex
allowed, empty, min, max
numberformat, dateformat, mindate, maxdate, stringformat
delimitedValues, if
Each _validate_<specification>
assumes the following input arguments:
The constraint provided in the whip specification, i.e. the right hand side of the colon in the whip specifications. In the implementation, the input parameter can be names differently to clarify the role of the constraint in the validation function.
The name of the field, i.e. the left hand side of the colon in the whip specifications which corresponds to the field header name in the data.
A single data value for which the whip specification needs to be tested using the provided constraint.
To validate the schema input itself, cerberus validation rules can be added to the docstring TODO ADDLINK
Extends the handling of Cerberus Validator
The following alterations are done:
* Allow_unkown is default set on True
* Initaition requires a schema
* By default, all fields without empty
specification get an
empty: False
specification. As such, empy strings are not allowed
by default, according to whip specifications.
If False, only terms with specifications are allowed as input. As unknown fields are reported by pywhip after validation, the default value is False.
pywhip.validators.
WhipErrorHandler
(tree=None)[source]¶Class to store custom error message handling
The WhipErrorHandler updates the
BasicErrorHandler
with custom messages for
pywhip specific specifications. Each of the messages updates the
message of a specification error, using the unique code
attributed in the ErrorDefinition
setup.
The message is a descriptive message about the error and can optionally use the following variables:
This refers to the individual data value of the document,
use {value}
This refers to the constraint provided by the whip
specification right hand side of the colon, use {constraint}
Optionally initialize a new instance.
pywhip.reporters.
SpecificationErrorHandler
(constraint)[source]¶Class handler for field-rule entity reporting
Notes
The SpecificationErrorHandler
class is basically
an enriched dictionary (using mapping), directly building on top
of a defaultdict
with the (wrong) values as
keys and a set
as values to add (unique) rows for which
that value occurs.
The constraint linked to the specification (field-rule combination), expressed as string
Dictionary with wrong data values as keys and the corresponding row identifiers as values.
build_error_report
(total_rows_count, top_n)[source]¶Convert defaultdict to regular dict for json reporting
Total rows of the current document working with, used to calculate passed rows as well
Number of samples (ordered on the number of rows) to retain for reporting purposes
Notes
build_error_report()
combines the information contained by the
_samples
attribute, for example:
{ ("07241981", "string format ...") : [2, 3, 5, 6],
("value", "message as provided by error") : [1, 2, 6,]
}
together with the other attributes into a json-style report:
{"constraint": "%Y-%m-%d, %Y-%m, %Y",
"failed_rows": 23,
"passed_rows": 3,
"samples": {
"07241981": {
"failed_rows": 4,
"first_row": 2,
"message": "string format ..."
},
"value": {
"failed_rows": n_rows,
"first_row": minimum of row identifiers,
"message": "message as provided by error"
}
}
}