Building data tests in PowerShell
Data tests in PowerShell
The solution we settled on has the following steps: Define a PS DataTable to store results We first create a DataTable in PowerShell for storing the results of the data tests.$results = New-Object System.Data.DataTable
$sheetName = New-Object System.Data.DataColumn sheetName,([string]) # Name of worksheet these results will be stored in.
$status = New-Object System.Data.DataColumn status,([string])
$output = New-Object System.Data.DataColumn output,([string])
# Define table columns
$results.Columns.Add($sheetName)
$results.Columns.Add($status)
$results.Columns.Add($output)
Define the test, pass/fail and status messages Here, we define the data tests query, pass/fail threshold and what to do if the query passes or fails. What happens within Get-ImportStatus is that the testQuery value is run and if the number of results is greater than or equal to the threshold, the data test fails. The errorQuery is then run to get the failed messages. If the test succeeds, the successQuery is run.
$testResults = Get-ImportStatus `
-sqlConnection $sqlConnection `
-testThreshold 0 `
-testQuery "SELECT count(1) FROM [config].[IngestionProcessLog] WHERE [Error_Process] IS NOT NULL;" `
-successMsg "No Errors encountered during processing:" `
-successQuery "SELECT [Source_Process],[Process_Status],[Log_DateTime] FROM [config].[IngestionProcessLog] where [Error_Process] IS NULL;" `
-errorMsg "The below Errors Have been recorded:" `
-errorQuery "SELECT [Source_Process],[Process_Status],[Log_DateTime] FROM [config].[IngestionProcessLog] where [Error_Process] IS NOT NULL;"
Call a stored procedure to run the test and return results in CSV Taking advantage of a stored procedure I found on Stack Overflow, we get the results in CSV format.
function Get-ImportStatus {
param (
$sqlConnection,
$testThreshold, # Test query number that the results are failed at.
$testQuery, # Should return a single number
$successMsg,
$errorMsg,
$successQuery,
$errorQuery
)
#
# This procedure runs the testQuery sent in and passes/fails the results based on the TestThreshold.
# Returns a hashtable of results
#
[hashtable]$row = @{}
$testStatus = Test-DataStatus -SQLConnection $sqlConnection -Query $testQuery # This just runs the SQL Query
if ($testStatus -gt $testThreshold) {
$row.status = $errorMsg
$row.output = Get-ImportData -SQLConnection $sqlConnection -Query $errorQuery
}
else {
$row.status = $successMsg
$row.output = Get-ImportData -SQLConnection $sqlConnection -Query $successQuery
}
return $row
}
Store results in the DataTable The returned results are then stored in the DataTable for reporting or manipulating in any way we want.
$row = $results.NewRow()
$row.sheetName = "ProcessStatus"
$row.status = $testResults.status
$row.output = $testResults.output
$results.Rows.Add($row)
I hope that through this demonstration we can help others who may also encounter this problem.
On this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Near Real-Time Data Processing for BigQuery: Part Two
Near Real-Time Data Processing for BigQuery: Part Two
May 4, 2021 12:00:00 AM
7
min read
Data preparation with dbt and BigQuery
Data preparation with dbt and BigQuery
Jul 21, 2021 12:00:00 AM
8
min read
Replicating MySQL to Snowflake with Kafka and Debezium—Part Two: Data Ingestion
Replicating MySQL to Snowflake with Kafka and Debezium—Part Two: Data Ingestion
May 17, 2021 12:00:00 AM
14
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.