Data Quality Tests
Data quality tests validate the reasonableness and business logic correctness of our pricing service data.
These tests go beyond structural integrity to ensure the data makes business sense and falls within expected ranges.
Table of Contents:
- Price Boundary Tests
- Ratio Validation Tests
- Fee Calculation Accuracy Tests
- SSP and Provider Type Plausibility Tests
1. Price Boundary Tests
SSP Maximum Price Bounds Check
Function: test_ssp_max_price_outlier_bounds()
SQL File: sql/qa/ssp_max_price_outlier_bounds.sql
Validates that max_price
values in the SSP table fall within reasonable bounds for non-cash payers.
SELECT count(*) as n
FROM {{ table }}
WHERE max_price IS NOT NULL
AND (max_price < lower_bound OR max_price > upper_bound)
Business Rule: Maximum prices should be between lower and upper bounds defined for each service by Clear Rates
Purpose: Identifies unrealistic price outliers that could indicate data quality issues.
Expected Result: 0 prices outside the valid range
SSP General Price Bounds
SQL File: sql/qa/ssp_max_price_bounds.sql
Ensures SSP prices fall within the $1 to $1M range for general validation.
SELECT count(*) as n
FROM {{ table }}
WHERE max_price IS NOT NULL
AND (max_price < 1 OR max_price > 1000000)
Purpose: Basic sanity check for price values.
Expected Result: 0 prices outside bounds
Regional Average Price Bounds Check
Function: test_payer_provider_type_ssp_cbsa_price_bounds_check()
SQL File: sql/qa/payer_provider_type_ssp_cbsa_price_bounds_check.sql
Validates that SSP max/min prices align with rate table bounds by payer, provider type, SSP ID, and CBSA.
WITH quality_stats AS (
SELECT
payer_id,
provider_type,
ssp_id,
cbsa,
min(lower_bound) as min_lower_bound,
max(upper_bound) as max_upper_bound,
max(max_price) as max_price,
min(max_price) as min_price
FROM {{ ssp_table }} ps
LEFT JOIN {{ rates_table }} r ON ps.max_price_id = r.id
WHERE cbsa IS NOT NULL AND cbsa != '99999' AND payer_id != '0'
GROUP BY 1, 2, 3, 4
)
SELECT count(*) as n
FROM quality_stats
WHERE max_price > max_upper_bound * 1.1
OR min_price < min_lower_bound * 0.9
Business Rule: SSP prices should not exceed rate bounds by more than 10%
Purpose: Ensures consistency between SSP and rates data
Expected Result: 0 cases where prices exceed bounds with 10% tolerance
2. Ratio Validation Tests
Price Ratio Reasonableness Check
Function: test_payer_provider_type_ssp_cbsa_ratio_check()
SQL File: sql/qa/payer_provider_type_ssp_cbsa_ratio_check.sql
Validates that the ratio between maximum and minimum prices within a payer-ssp-cbsa group doesn't exceed reasonable limits.
Business Rule: Price ratios should never exceed 150x (15,000% difference)
Purpose: Identifies cases where price variations are unreasonably large, indicating potential data quality issues
Expected Result: 0 cases where ratio exceeds 150x
3. Fee Calculation Accuracy Tests
Canonical Rate Facility Fee Alignment
Function: test_canonical_rate_facility_fee_alignment()
Location: qa/data_quality_tests.py:90
Validates that facility fees in the SSP table align with canonical rates from the CLD schema and fall within expected bounds.
df = hook.get_pandas_df(f"""
SELECT
ps.facility_fee,
cld.canonical_rate,
cld.canonical_rate_score,
cld.lower_bound,
cld.upper_bound,
cld.state_avg_medicare_rate
FROM {ps_schema}.pricing_service_ssp_{payer_month} ps
JOIN {cld_schema}.prod_combined_all cld
ON ps.roid = cld.roid
AND ps.payer_id = cld.payer_id
WHERE facility_fee IS NOT NULL
""")
Business Rule: Facility fees should closely match canonical rates and fall within lower/upper bounds
Purpose: Ensures facility fee calculations align with established rate benchmarks
Expected Result: All facility fees match canonical rates and stay within bounds
Facility Fee Calculation Accuracy
Function: test_facility_fee_calculation_accuracy()
Location: qa/data_quality_tests.py:126
Verifies that facility fees in the SSP table match the calculated facility fees derived from the rates table.
Test Logic:
- Samples SSP data for non-cash payers (excluding imaging centers)
- Retrieves corresponding rates data for facility fees (package base codes)
- Compares SSP facility_fee with rates table negotiated_rate
Business Rule: SSP facility fees should match rates table calculations
Purpose: Ensures consistency between derived SSP data and underlying rates
Expected Result: Facility fees match calculated values from rates table
Professional Fee Calculation Accuracy
Function: test_professional_fee_calculation_accuracy()
Location: qa/data_quality_tests.py:175
Validates that professional fees in the SSP table match calculated professional fees from the rates table.
Test Logic:
- Samples SSP data for non-cash payers with 5-character billing codes
- Retrieves professional fee rates data
- Calculates expected professional fees:
negotiated_rate × average_units × anesthesia_unit_conversion
- Compares calculated totals with SSP professional_fee values
Business Rule: SSP professional fees should equal sum of calculated professional components
Purpose: Ensures accurate professional fee derivation from rates data
Expected Result: Professional fees match calculated values including unit conversions
4. SSP and Provider Type Plausibility Tests
Imaging Centers Only RA SSPs
Function: test_imaging_centers_only_ra_ssps()
Location: qa/data_quality_tests.py
Validates that Imaging Centers only offer radiology SSPs (those starting with 'RA').
SELECT COUNT(*) as violation_count
FROM pricing_service_ssp
WHERE provider_type = 'Imaging Center'
AND ssp_id NOT LIKE 'RA%'
Business Rule: Imaging Centers should only provide radiology services (RA SSPs)
Purpose: Ensures provider type and service offerings are clinically plausible
Expected Result: 0 violations (no Imaging Centers with non-RA SSPs)
ASCs No OB SSPs
Function: test_ascs_no_ob_ssps()
Location: qa/data_quality_tests.py
Validates that Ambulatory Surgery Centers (ASCs) do not offer obstetrics SSPs (those starting with 'OB').
SELECT COUNT(*) as violation_count
FROM pricing_service_ssp
WHERE provider_type = 'ASC'
AND ssp_id LIKE 'OB%'
Business Rule: ASCs should not provide obstetrics services (OB SSPs)
Purpose: Ensures provider type and service offerings are clinically plausible
Expected Result: 0 violations (no ASCs with OB SSPs)