This release provides a few minor improvements along with many bug fixes.
New argument extract_tbl_checked
added to
interrogate()
. When FALSE
, the
$tbl_checked
column from the validation set will be dropped
before returning the agent. This may be helpful in reducing object size
for large agents (#542). (#554)
The new argument na_rm
in snip_list()
suppresses any NA
values so that they won’t included in the
snippet’s list of items (#547). (#556)
Improved readability of error messages rendered as tooltips in the agent report. (#543)
col_vals_expr()
shows used columns in the agent
report when interrogated. (#570)
Improved the matching of rows between
agent$validation_step
and the rows of the agent report
(#563). (#565)
Functions accepting ...
now use
rlang::list2()
, enabling dynamic dots. For example, a
multiagent can now be constructed from a list()
of agents
using create_multiagent(!!!list_of_agents)
(#552).
(#553)
Fixed bug with non-standard column names in some validation functions (#545, #546). (#555)
Fixed a regression in col_vals_*()
functions, where
vars("col")
was evaluating to the string
"col"
. Behavior of vars("col")
is now aligned
back with vars(col)
- both evaluate to the column name
col
. (#535)
Problems arising from comparing columns
to a
value
of different class (for example, comparing a datetime
column to a date value Sys.Date()
instead of another
datetime value Sys.time()
) are now signalled appropriately
at interrogate()
(#536, #537). (#539)
Fixed bug in has_columns()
failing to detect
non-existing columns when supplied as a character vector.
(#540)
Replace uses of crayon::make_style()
with
cli::make_ansi_style()
, removing the crayon
dependency. (#559, thanks @olivroy!)
Use rlang::check_installed()
to perform checks of
optional package installs. (#559, @olivroy)
Modernized CI workflows with dedicated linting action. (#560, @olivroy)
Avoid unwanted equation formatting in agent report arising from
arbitrary "$"
characters (#561). (#562)
Ensured that the column string is a symbol before constructing
the expression for the col_vals_*()
functions.
No longer resolve columns with tidyselect when the target table cannot be materialized.
Relaxed tests on tidyselect error messages.
Complete {tidyselect}
support for the
columns
argument of all validation functions, as
well as in has_columns()
and info_columns
. The
columns
argument can now take familiar column-selection
expressions as one would use inside dplyr::select()
. This
also begins a process of deprecation:
columns = vars(...)
will continue to work, but
c()
now supersedes vars()
.all_of()
.The label
argument of validation functions now
exposes the following string variables via {glue}
syntax:
"{.step}"
: The validation step name"{.col}"
: The current column name"{.seg_col}"
: The current segment’s column name"{.seg_val}"
: The current segment’s value/groupThese dynamic values may be useful for validations that get expanded into multiple steps.
interrogate()
gains two new options for printing
progress in the console output:
progress
: Whether interrogation progress should be
printed to the console (TRUE
for interactive sessions, same
as before)show_step_label
: Whether each validation step’s label
value should be printed alongside the progress.Fixes issue with rendering reports in Quarto HTML documents.
When no columns are returned from a {tidyselect}
expression in columns
, the agent’s report now displays the
originally supplied expression instead of being simply blank
(e.g., in
create_agent(small_table) |> col_vals_null(matches("z"))
).
Fixes issue with the hashing implementation to improve performance and alignment of validation steps in the multiagent.
0.9.0
compatibility.The row_count_match()
function can now match the
count of rows in the target table to a literal value (in addition to
comparing row counts to a secondary table).
The analogous col_count_match()
function was added
to compare column counts in the target table to a secondary table, or,
to match on a literal value.
Substitution syntax has been added to the
tbl_store()
function with {{ <name> }}
.
This is a great way to make table-prep more concise, readable, and less
prone to errors.
The get_informant_report()
has been enhanced with
more width
options. Aside from the "standard"
and "small"
sizes we can now supply any pixel- or
percent-based width to precisely size the reporting.
Added support for validating data in BigQuery tables.
The new function row_count_match()
(plus
expect_row_count_match()
and
test_row_count_match()
) checks for exact matching of rows
across two tables (the target table and a comparison table of your
choosing). Works equally well for local tables and for database and
Spark tables.
The new tbl_match()
function (along with
expect_tbl_match()
and test_tbl_match()
) check
for an exact matching of the target table with a comparison table. It
will check for a strict match on table schemas, on equivalent row
counts, and then exact matches on cell values across the two
tables.
The set_tbl()
function was given the
tbl_name
and label
arguments to provide an
opportunity to set metadata on the new target table.
Support for mssql
tables has been restored and works
exceedingly well for the majority of validation functions (the few that
are incompatible provide messaging about not being supported).
All functions in the package now have usage examples.
An RStudio Cloud project has been prepared with .Rmd files that contain explainers and runnable examples for each function in the package. Look at the project README for a link to the project.
The read_fn
argument in create_agent()
and create_informant()
has been deprecated in favor of an
enhanced tbl
argument. Now, we can supply a variety of
inputs to tbl
for associating a target table to an agent or
an informant. With tbl
, it’s now possible to provide a
table (e.g., data.frame
, tbl_df
,
tbl_dbi
, tbl_spark
, etc.), an expression (a
table-prep formula or a function) to read in the table only at
interrogation time, or a table source expression to get table
preparations from a table store (as an in-memory object or as defined in
a YAML file).
The set_read_fn()
, remove_read_fn()
,
and remove_tbl()
functions were removed since the
read_fn
argument has been deprecated (and there’s virtually
no need to remove a table from an object with remove_tbl()
now).
The new rows_complete()
validation function (along
with the expect_rows_complete()
and
test_rows_complete()
expectation and test variants) check
whether rows contain any NA
/NULL
values
(optionally constrained to a selection of specified
columns
).
The new function serially()
(along with
expect_serially()
and test_serially()
) allows
for a series of tests to run in sequence before either culminating in a
final validation step or simply exiting the series. This construction
allows for pre-testing that may make sense before a validation step. For
example, there may be situations where it’s vital to check a column type
before performing a validation on the same column.
The
specially()
/expect_specially()
/test_specially()
functions enable custom validations/tests/expectations with a
user-defined function. We still have preconditions
and
other common args available for convenience. The great thing about this
is that because we require the UDF to return a logical vector of
passing/failing test units (or a table where the rightmost column is
logical), we can incorporate the results quite easily in the standard
pointblank reporting.
The info_columns_from_tbl()
function is a
super-convenient wrapper for the info_columns()
function.
Say you’re making a data dictionary with an informant and you
already have the table metadata somewhere as a table: you can use that
here and not have to call info_columns()
many, many
times.
Added the game_revenue_info
dataset which contains
metadata for the extant game_revenue
dataset. Both datasets
pair nicely together in examples that create a data dictionary with
create_informant()
and
info_columns_from_tbl()
.
Added the table transformer function
tt_tbl_colnames()
to get a table’s column names for
validation.
Input data tables with label
attribute values in
their columns will be displayed in the ‘Variables’ section of the
scan_data()
report. This is useful when scanning imported
SAS tables (which often have labeled variables).
The all_passed()
function has been improved such
that failed validation steps (that return an evaluation error, perhaps
because of a missing column) result in FALSE
; the
i
argument has been added to all_passed()
to
optionally get a subset of validation steps before evaluation.
For those expect_*()
functions that can handle
multiple columns, pointblank now correctly stops at the
first failure and provides the correct reporting for that. Passing
multiple columns really should mean processing multiple steps in serial,
and previously this was handled incorrectly.
The new draft_validation()
function will create a
starter validation .R or .Rmd file with just a table as an input. Uses a
new ‘column roles’ feature to develop a starter set of validation steps
based on what kind of data the columns contain (e.g., latitude/longitude
values, URLs, email addresses, etc.).
The validation function col_vals_within_spec()
(and
the variants expect_col_vals_within_spec()
and
test_col_vals_within_spec()
) will test column values
against a specification like phone numbers ("phone"
), VIN
numbers ("VIN"
), URLs ("url"
), email addresses
("email"
), and much more ("isbn"
,
"postal_code[<country_code>]"
,
"credit_card"
, "iban[<country_code>]"
,
"swift"
, "ipv4"
, "ipv6"
, and
"mac"
).
A large cross section of row-based validation functions can now
operate on segments of the target table, so you can run a particular
validation with slices (or segments) of the target table. The
segmentation is made possible by use of the new segments
argument, which takes an expression that serves to segment the target
table by column values. It can be given in one of two ways: (1) as a
single or multiple column names containing keys to segment on, or (2) as
a two-sided formula where the LHS holds a column name and the RHS
contains the column values to segment on (allowing for a subset of keys
for segmentation).
The default printing of the multiagent object is now a
stacked display of agent reports. The wide report (useful for
comparisons of validations targeting the same table over time) is
available in the improved get_multiagent_report()
function
(with display_mode = "wide"
).
Exporting the reporting is now much easier with the new
export_report()
function. It will export objects such as
the agent (for validations), the informant (for table
metadata), and the multiagent (a series of validations), and,
also those objects containing customized reports (from
scan_data()
, get_agent_report()
,
get_informant_report()
, and
get_multiagent_report()
). You’ll always get a
self-contained HTML file of the report from any use of
export_report()
.
A new family of functions has been added to
pointblank: Table Transformers! These functions can
radically transform a data table and either provide a wholly different
table (like a summary table or table properties table) or do some useful
filtering in a single step. This can be useful for preparing the target
table for validation or when creating temporary tables (through
preconditions
) for a few validation steps (e.g., validating
table properties or string lengths). As a nice bonus these transformer
functions will work equally well with data frames, database tables, and
Spark tables. The included functions are:
tt_summary_stats()
, tt_string_info()
,
tt_tbl_dims()
, tt_time_shift()
, and
tt_time_slice()
.
Two new datasets have been added: specifications
and
game_revenue
. The former dataset can be used to test out
the col_vals_within_spec()
validation function. The latter
dataset (with 2,000 rows) can be used to experiment with the
tt_time_shift()
and tt_time_slice()
table
transformer functions.
Added the Polish ("pl"
), Danish ("da"
),
Turkish ("tr"
), Swedish ("sv"
), and Dutch
("nl"
) translations.
The scan_data()
function is now a bit more
performant, testable, and better at communicating progress in generating
the report.
The preconditions
argument, used to modify the
target table in a validation step, is now improved by (1) checking that
a table object is returned after evaluation, and (2) correcting the YAML
writing of any preconditions
expression that’s provided as
a function.
The x_write_disk()
and x_read_disk()
have been extended to allow the writing and reading of
ptblank_tbl_scan
objects (returned by
scan_data()
).
Print methods received some love in this release. Now,
scan_data()
table scan reports look much better in
R Markdown. Reporting objects from
get_agent_report()
, get_informant_report()
,
and get_multiagent_report()
now have print methods and work
beautifully in R Markdown as a result.
The incorporate()
function, when called on an
informant object, now emits styled messages to the console. And
when using yaml_exec()
to process an arbitrary amount of
YAML-based agents and informants, you’ll be given
information about that progress in the console.
all_passed()
,
get_data_extracts()
, get_multiagent_report()
,
get_sundered_data()
, has_columns()
,
write_testthat_file()
, x_write_disk()
, and
yaml_exec()
.New functions for set-based interrogations:
col_vals_make_set()
(+
expect_col_vals_make_set()
and
test_col_vals_make_set()
) and
col_vals_make_subset()
(+
expect_col_vals_make_subset()
and
test_col_vals_make_subset()
); they answer the following two
questions: (1) is a set of values entirely accounted for in a column of
values?, and (2) is a set of values a subset of a column of
values?
New functions for order-based interrogations:
col_vals_increasing()
(+
expect_col_vals_increasing()
and
test_col_vals_increasing()
) and
col_vals_decreasing()
(+
expect_col_vals_decreasing()
and
test_col_vals_decreasing()
); they check that column values
are either increasing or decreasing and both have options to allow for
non-moving values and backtracking (with a threshold).
Several functions added to facilitate multi-agent workflows:
create_multiagent()
, read_disk_multiagent()
,
and get_multiagent_report()
; these workflows help to track
interrogation results across multiple agents and the reporting scales
well from several to dozens of agents.
The new function write_testthat_file()
generates a
testthat test file and puts it in
tests/testthat
if certain conditions are met; this converts
an agent’s validation plan into separate
expect_*()
statements.
New functions tbl_store()
,
tbl_source()
, and tbl_get()
functions added
for centrally managing table-prep formulas.
Added the yaml_exec()
function that processes all
relevant pointblank YAML files in a directory;
execution involves interrogation of agents (given YAML agents) and
incorporation of informants (given YAML informants), saving all the
processed objects to an output directory.
The new functions file_tbl()
and helper
from_github()
make it easy to generate a table from a
compatible data file; a file could be in the form of CSV
,
TSV
, RDA
, or RDS
.
Several functions have been added for modifying an
agent’s validation plan: activate_steps()
,
deactivate_steps()
, remove_steps()
.
Added the snip_stats()
function for generating an
in-line statistical summary in an information report.
Add sorting options for snip_list()
so we can choose
to sort column items by frequency or sequentially
(alphabetically/numerically).
More improvements were made to snip_list()
to: (1)
have a better default appearance, (2) enable more customization, and (3)
include localization options for the supported spoken
languages.
Added several options for customizing the main reporting heading in three reporting objects: the agent report, the information report, and the multiagent report.
The active
argument in every validation function can
now take an expression that evaluates to a logical; the
has_columns()
has been added to make it easy to express in
active
whether one or more columns are present in the
target table (e.g., perform the validation step only if the target
column is available).
Added support for using Arrow tables as target tables for informant objects.
Added information on YAML representations of all validation functions and several other functions that make an appearance in YAML.
General improvements to function documentation were made to a wide cross section of the exported functions.
Included method for writing an informant object to disk
(with x_write_disk()
).
Many fixes were made and tests added to ensure that
agents survive the YAML roundtrip (so agent
%>%
yaml_write()
then yaml_read_agent()
creates
the same agent
object).
Updated several internal dplyr::arrange()
statements
used by scan_data()
so that warnings aren’t issued by
dbplyr (for table scans operating on
tbl_dbi
objects).
All tidyselect expressions used with agents are now preserved when the agent is written to YAML.
The new information management workflow is full of features that help you to describe tables and keep on top of changes to them. To make this work well, a new character enters: the informant!
Added the create_informant()
function to create a
ptblank_informant
object (this function is similar to
create_agent()
). It is meant to hold information (as much
as you want, really) for a target table, with reporting features geared
toward communication.
Functions for facilitating entry of info text were added
because we need them (info_tabular()
,
info_columns()
, and info_section()
). These are
focused on describing columns, the table proper, and other misc.
fields.
If all that wasn’t enough, this release adds
info_snippet()
to round out the collection of
info_*()
functions for this workflow. Oh, hang on, there’s
also the all-important incorporate()
function. What? To
explain, the idea is to have some methodology for acquiring important
bits of data from the target table (that’s info_snippet()
’s
job) and then use incorporate()
to grab those morsels of
data and stitch them into the info text (via
{ }
).
Added the get_informant_report()
function for
printing the information report (a gt table
object!).
You can also just print the informant object to show the information report thanks to a print method for this purpose.
The informant object can be written to
pointblank YAML using the revised
yaml_write()
(previously agent_yaml_write()
)
function. We can actually write both the agent and the
informant to the same YAML file which is useful since both
objects share the same target table. Reading is done with the
yaml_read_agent()
and yaml_read_informant()
functions.
The informant can be emailed using the
email_create()
function; this emailing can be done in one
of eight languages for the stock message text.
More text in the agent report is translated now.
Improved the Spanish (Spain) translation.
Added the Portuguese ("pt"
, Brazil), Chinese
("zh"
, China mainland), and Russian ("ru"
)
translations.
Added a locale option for reporting; the locale will match the
language (using the base locale) unless a different locale is specified.
The locale is used to format numeric values according to the locale’s
rules. This also applies to the reporting offered by the
scan_data()
function.
All stock email message parts (used when emailing the agent report or the information report) have been translated to the eight supported languages. The language setting in the respective objects is used to determine the language of the stock message parts.
The yaml_write()
function replaces the
agent_yaml_write()
function. The new function works to
write the agent, the informant object, or both, to
YAML.
The names of more YAML functions have been changed, the final
roster now consists of: yaml_write()
,
yaml_read_agent()
, yaml_read_informant()
,
yaml_agent_interrogate()
, yaml_agent_string()
,
and yaml_agent_show_exprs()
.
The x_write_disk()
function replaces the
agent_write()
function. The new function works to write the
agent or the informant object to disk.
The x_read_disk()
function replaces the
agent_read()
function. The new function works to read both
the agent or the informant objects written to
disk.
The email_preview()
function has been renamed to
email_create()
.
The new db_tbl()
function makes it ridiculously easy
to access a database table from the selection of databases that
pointblank supports for validation; they are accessible
with the supplied keywords "postgres"
(PostgreSQL),
"mysql"
(MySQL), "maria"
(MariaDB),
"duckdb"
(DuckDB), "sqlite"
(SQLite), or, with
any driver function you’d like to supply.
Added the log4r_step()
function which can be used as
an action in an action_levels()
function call (i.e., a list
component for the fns
list). We can place a call to this
function in every condition that should produce a log entry (i.e.,
warn
, stop
, notify
).
Added several articles that explain the different validation workflows (there are six of ’em) and articles that go over the Information Management workflow.
Improved documentation for almost all functions in the package; added more useful examples.
Added a table to the project README
that keeps
everyone apprised of the project milestones and the issues to be closed
for each upcoming release.
Improved appearance of the agent report: (1) more tooltips, (2)
the tooltips are much improved (they animate, have larger text, and are
snappier than the previous ones), (3) SVGs are now used as symbols for
the validation steps instead of blurry PNGs, (4) less confusing glyphs
are now used in the TBL
column, (5) the agent label can be
expressed as Markdown and looks nicer in the report, (6) the table type
(and name, if supplied as tbl_name
) is shown in the header,
(7) validation threshold levels also shown in the table header, (8)
interrogation starting/ending timestamps are shown (along with duration)
in the table footer, (9) the table font has been changed to be less
default-y, and (10) adjustments to table borders and cell shading were
made for better readability.
The get_agent_report()
function now has
lang
and locale
arguments to override any of
those values set prior (e.g., in create_agent()
). This
allows for the reporting language to be changed without the need to
re-run everything from scratch.
The set_tbl()
, remove_tbl()
,
set_read_fn()
, and remove_read_fn()
functions
can now also be used with an informant object.
The get_sundered_data()
function is more clear with
regard to which validation steps are considered for splitting of the
data. Using validation steps with preconditions
must
fulfill the rule that the target table only have a single form across
steps.
The is_exact
argument is new in the
col_schema_match()
, expect_col_schema_match()
,
and test_col_schema_match()
functions, further allowing
these types of validations to be less stringent. This argument loosens
the requirement to include all class names for a column that may have
multiple. Also, we can specify NULL
to entirely skip the
checking of a class/type.
We can now use more combinations of validation functions in
conjointly()
. Those validation functions that intrinsically
operate over a single test unit (e.g., all of the
col_is_*()
functions) now work in combination with
validation functions that operate over n test units (e.g., the
col_vals_*()
functions). This lets us test for a condition
where columns are of a certain type AND individual test units
fulfill the col_vals_*()
requirements.
Simplified the sections
argument of
scan_data()
to be a length-1 character vector containing
key characters standing for section names.
Refactored a large portion of the code that produces the agent report to increase rendering speed.
Improved printing of errors/warnings (in the tooltips of the
EVAL
column in the agent report) thanks to implementation
of HTML escaping.
The small version of the agent report (perfect for emailing) now has much improved formatting.
Fixes a performance issue for validations on larger tables.
Improved formatting of value ranges in the agent report.
Improved compatibility with validations performed on SQL Server 2019.
Integrated the label
argument into all validation
functions; this label is available in the agent x_list
and,
more importantly, displayed in the agent report (in the
STEP
column).
Added the "combined"
option in the
get_sundered_data()
function (for the type
argument). This applies a categorical (pass/fail) label (settable in the
new pass_fail
argument of the same function) in a new
.pb_combined
flag column of the output table.
Made several visual improvements to the agent report.
The agent can now be given a table-reading function,
which is used for reading in the data during an interrogation. If a
tbl
is not provided, then this function will be invoked.
However, if both a tbl
and a read_fn
is
specified, then the supplied tbl
will take priority (useful
for one-shot interrogations with a table in an interactive context).
There are two ways to specify a read_fn
: (1) using a
function (e.g., function() { <table reading code> }
)
or, (2) with an R formula expression (e.g.,
~ { <table reading code> }
).
Added a a set of functions for setting and removing an agent’s
association to a data table (set_tbl()
and
remove_tbl()
) or a table-reading function
(set_read_fn()
and remove_read_fn()
).
All validation functions now have a step_id
parameter. The use of step IDs serves to distinguish validation steps
from each other and provide an opportunity for supplying a more
meaningful label compared to the step index. Supplying a
step_id
is optional; pointblank will
automatically generate the step ID value (based on the step index) if
it’s not provided.
Added new functions for reading and writing YAML (here, called
pointblank YAML). A pointblank YAML
file can be generated with an agent by using the
agent_yaml_write()
function. You’re always free to create
pointblank YAML by hand, or, you can edit/extend an
existing pointblank YAML file. An agent can be created
from pointblank YAML with the
agent_yaml_read()
function. It’s also possible to
interrogate a target data table right from pointblank
YAML by using agent_yaml_interrogate()
.
The agent_write()
and agent_read()
functions were added; they allow for saving the agent to disk and
reading the agent back from disk. Saved-to-disk agents still retain
their validation plans, intel from interrogations, and their reference
to a target table (the read_fn
value) and even the entire
target table (if requested). Reading an agent from disk with
agent_read()
allows us to use post-interrogation functions
(e.g., get_agent_x_list()
,
get_data_extracts()
, get_agent_report()
, etc.)
as though the interrogation had just occurred.
pointblank is now compatible with Spark
DataFrames through the sparklyr package. Simply use a
tbl_spark
object when specifying the target table in
create_agent()
, set_tbl()
, or
scan_data()
.
An issue with showing the agent report table in the email message
body via the email_blast()
function has been
resolved.
Resolved issue with using literal character values in
comparison-based validation functions (e.g.,
col_vals_between()
, col_vals_gt()
,
etc.).
Completely rewrote the underlying processes for the storage and retrieval of translation text.
Much improved translations of reporting text the Spanish and German languages. Thanks @pachamaltese and @DavZim for these valuable contributions!
New testthat tests were added that test pointblank validations against mock PostgreSQL and MySQL database tables via the dittodb package. Thank you @pachamaltese for implementing these tests.
New R Markdown validation feature allows for validation testing
within specialized validation code chunks where the
validate = TRUE
option is set. Using
pointblank validation functions on data in these marked
code chunks will flag overall failure if the stop threshold is exceeded
anywhere. All errors are reported in the validation code chunk after
rendering the document to HTML, where green or red status buttons
indicate whether all validations succeeded or failures occurred.
Clicking any such button reveals the otherwise hidden validation
statements and their error messages (if any). Using
pointblank in an R Markdown workflow is enabled by
default once the pointblank library is loaded. While
the framework for such testing is set up by default, the new
validate_rmd()
function offers an opportunity to set UI and
logging options.
Added an R Markdown template for the new R Markdown validation
feature (Pointblank Validation
).
The new stop_if_not()
function works well as a
standalone, replacement for stopifnot()
but is also
customized for use in validation checks in R Markdown documents where
pointblank is loaded. Using stop_if_not()
in a code chunk where the validate = TRUE
option is set
will yield the correct reporting of successes and failures whereas
stopifnot()
does not.
A knit.print()
method was added to facilitate the
printing of the agent report table within an R Markdown code
chunk.
col_vals_lt()
) directly on data tables has been changed.
Before, a single test unit failure would trigger a warning. Now, a
single test unit failing results in an error. Going back to the earlier
behavior now requires the use of actions = warn_on_fail()
(a new helper function, which has a default warn_at
threshold value of 1
) with each invocation of a validation
step function. The stop_on_fail()
helper function is also
new in this release, and has a stop_at
threshold parameter,
also with a default of 1
.Added 24 expectation functions (e.g.,
expect_col_exists()
, expect_rows_distinct()
,
expect_col_schema_match()
, etc.) as complements of the 24
validation functions. All of these can be used for
testthat tests of tabular data with a simplified
interface that exposes an easy-to-use failure threshold
(defaulting to 1
).
Added 24 test functions (e.g.,
test_col_exists()
, test_rows_distinct()
,
test_col_schema_match()
, etc.) to further complement the 24
validation functions. These functions return a logical value:
TRUE
if the threshold (having a default of 1
)
is not exceeded, FALSE
otherwise. These
test_*()
functions use the same simplified interface of the
expect_*()
functions.
Added the col_vals_expr()
,
expect_col_vals_expr()
, and
test_col_vals_expr()
validation,
expectation, and test functions, making it easier for
DIY validations. The dplyr expr()
,
case_when()
, and between()
functions were
re-exported for easier accessibility here since they work exceedingly
well with the new functions.
col_schema_match()
(and its expect and
test analogues) gained new arguments: complete
and
in_order
. These allow for some relaxation of constraints
related to the completeness and ordering of columns defined in a
col_schema
object (created by
col_schema()
).
The preconditions
argument available in all
validation, expectation, and test functions
now accepts both formula and function values (previously, only formula
values were accepted).
The get_agent_report()
function now has a
size
argument as an option to get the agent report table in
the "standard"
(width: 875px) size or the
"small"
size (width: 575px); previously this option was
only accessible through ...
.
The appearance of the agent report has improved and it’s gained
some new features: (1) data extracts for failing rows (on row-based
validation steps) can be downloaded as CSVs via the new buttons that
appear in the EXT
column, (2) there are useful has tooltips
on most fields of the table (e.g., hovering over items in
STEP
will show the brief, TBL
icons will
describe whether any preconditions were applied to the table prior to
interrogation, etc.), and (3) there are printing improvements in the
COLUMNS
and VALUES
columns (e.g., table
columns are distinguished from literal values).
Improved the appearance of the email message generated by
email_blast()
and email_preview()
. This email
message, when using the stock_msg_body()
and
stock_msg_footer()
as defaults for msg_body
and msg_footer
, embeds a "small"
version of
the agent report and provides some introductory text with nicer
formatting than before.
All functions now have revised documentation that is more complete, has more examples, and consistent across the many validation, expectation, and test functions.
The package README
now contains better graphics,
some reworked examples, and a new section on the package’s design goals
(with a listing of other R packages that also focus on
table validation).
Rewrote the internal stock_stoppage()
and
stock_warning()
functions so that the generated error and
warning messages match whether validation functions are used directly on
data or expectation functions are being used.
Console status messages when performing an interrogation now only appear in an interactive session. They will no longer appear during R Markdown rendering nor during execution of unattended scripts.
The col_vals_regex()
validation function
(plus the associated expectation and test functions)
can now be used with database tables (on some of the DB types that
support regular expressions). This has been tested on MySQL and
PostgreSQL, which have differing underlying SQL
implementations.
The col_schema()
function now allows for either
uppercase or lowercase SQL column types (using
.db_col_types = "sql"
). Previously, supplying SQL columns
types as uppercase (e.g., “INT”, “TINYINT”, etc.) would always fail
validation because the SQL column types of the target table are captured
as lowercase values during the create_agent()
call.
Many new tests were added to cover both the new functions and the existing functions. It’s important for a validation package that testing be comprehensive and rigorous, so, this will continue to be a focus in forthcoming releases.
Fixed a duration label bug in the console status messages that appear during interrogation (now consistently has values reported in seconds)
Added column validity checks inside of internal
interrogate_*()
functions
Fixed implementation of the col_vals_between()
and
col_vals_not_between()
step functions to work with
tbl_dbi
objects.
Added the scan_data()
function, which thoroughly
scans any table data so you can understand it better (giving you an HTML
report).
Added the get_agent_x_list()
function to provide
easy access to agent intel
Added the get_agent_report()
function to give fine
control over the agent’s gt-based reportage; also, the agent’s default
print method is now that report (with default appearance
options)
Added the get_sundered_data()
function to split the
table data into ‘pass’ and ‘fail’ pieces after interrogation
Added the col_schema_match()
validation step
function; it works in conjunction with a col_schema
object
(generated through the col_schema()
function) to help
determine whether the expected schema matches the target table.
Added multilingual support to reports generated by agent
validations and by those produced through the new
scan_data()
function
More fully integrates the gt (for tables in reports) and blastula (for email production and delivery) packages
Numerous fixes to ensure compatibility with tibble 3.0.0
The pointblank package has been changed significantly from the previous version in favor of consistency and simplicity, better reporting, and increased power. The internals have been extensively refactored and the API has accordingly gone through revisions.
The focus_on()
function has been removed in favor of
directly using a data object. This means that a single use of
create_agent()
can now only work on a single table at a
time (create_agent()
now has a tbl
argument).
Also, the input tbl
can be a data.frame
, a
tbl_df
, or a tbl_dbi
object.
The preconditions
argument has changed and it can
now be used to temporarily transform the table (i.e., transforming for a
particular validation step). Previously, this option could only filter
the input table but now it’s possible to do useful things like joining
in a table, adding columns, filtering rows, etc. The
preconditions
args now accepts a list of expressions that
manipulate the table data.
The action_levels()
helper function is introduced to
work with the actions
argument (in every validation step
function). This replaces the warn_count
,
stop_count
, notify_count
,
warn_fraction
, stop_fraction
, and
notify_fraction
arguments. The function allows for
evaluation of functions (given in the fns
argument) as a
reaction to exceeding thresholds specified in warn_at
,
stop_at
, and notify_at
.
When using validation step functions directly on data (i.e., no
use of create_agent()
), data is now passed straight through
after that validation step. The purpose now in that mode is to create
warnings or throw errors if the warn
or stop
thresholds are exceeded.
Across all pointblank validation step functions,
the argument that stands for table columns has been normalized to
columns
.
The incl_na
argument, which was implemented in a few
validation step functions, has been renamed to na_pass
to
better indicate its purpose (to consider any encountered NA
values as passing test units), and, its use has been expanded to other
relevant functions.
It’s now possible to use vars()
and certain
tidyselect select helpers (e.g., starts_with()
) when
defining columns
in the pointblank
validation step functions.
The conjointly()
function is a new validation step
function that allows for multiple rowwise validation steps to be
performed for joint validity testing.
Revisions on account of API changes in tidyr
1.0.0
.
Incorporates corrections related to API changes in
rlang 0.2.0
.