What Data Validity Means
Data validity measures the degree to which data values conform to the syntax, format, range, and business rules defined for each field. A value can be present (complete) and unique — but still invalid if it doesn't conform to what that field is supposed to contain.
Validity is about rules. Every field in your data has implied or explicit rules about what values are acceptable. Validity measures how consistently those rules are followed.
Examples of invalid data:
- A date of 2026-02-30 (February 30 doesn't exist)
- An email address without an @ symbol
- A product price of -$50.00
- A US phone number with 9 digits instead of 10
- A status field containing "Pendin" instead of "Pending" (a typo that breaks automations relying on that value)
- A customer age of 847 (clearly a data entry error)
- An order with a delivery date before its order date
All of these values exist in the field. None of them are valid. And all of them will cause problems downstream — in reports, in integrations, in the automations that depend on those values being correct.
The Three Types of Validity Rules
Syntactic validity: Does the value conform to the correct format or pattern?
Sohovi profiles your datasets for quality issues in minutes — see what's broken before it breaks your pipeline — try Sohovi free.
- Email: must match the pattern local@domain.tld (an @ sign, a domain, a top-level domain)
- Phone: must contain the right number of digits for the country (10 for US numbers without country code)
- Date: must be a real calendar date in the specified format
- Zip code: must be 5 digits or 5+4 digits for US postal codes
- URL: must start with http:// or https:// and include a valid domain structure
Range validity: Does the value fall within acceptable bounds?
- Age: between 0 and 130 (values outside this range are almost certainly errors)
- Temperature in Celsius: above absolute zero (-273.15°C)
- Order quantity: greater than 0 (an order for zero units doesn't make sense)
- Discount percentage: between 0 and 100 (a 150% discount is invalid)
- Revenue: cannot be negative unless your business definition includes returns and refunds
Business rule validity: Does the value satisfy rules specific to your data model?
- A customer's "last purchase date" cannot be in the future
- An end date cannot be before its corresponding start date
- A shipped order's tracking number cannot be blank
- A paid invoice's payment date cannot be blank
- A "closed won" deal in your CRM must have a close date
Sohovi finds gaps, duplicates, and format errors in your CRM data — so your team is working from records they can trust.
Business rule validity is the hardest to catch with generic tools because the rules are specific to your data model. A date value that's syntactically valid and within a reasonable range may still violate your business rules — an invoice payment date of five years ago might be technically valid but practically impossible for a business that's only been operating for two years.
Why Invalid Data Is Different From Inaccurate Data
Validity and accuracy are often confused, but they're distinct dimensions that require different fixes.
Validity is a structural question: does this value conform to the rules? A phone number with 9 digits fails a validity check. An email with two @ signs fails a validity check. These can be caught automatically with pattern matching and rules.
Accuracy is a factual question: is this value actually true? A phone number that has the right format (10 digits, passes validity) but belongs to a different person is accurate-format but factually inaccurate. Accuracy problems require external verification; validity problems can be caught internally.
Sohovi scores your dataset against your own accuracy standards and highlights the columns and rows where values fall outside expected ranges.
This matters for prioritization: validity fixes are usually automated (run a rule, flag failures, correct or remove). Accuracy fixes often require manual review, external data sources, or re-contacting the customer.
Measuring Validity
Validity rate per field:
Validity = (Values passing all validity rules / Total non-null values) × 100
Note that this metric only applies to non-null values. A null value doesn't violate validity — it violates completeness. Validity checks apply to values that are present; completeness checks apply to whether values are present at all.
Measure validity field by field, not as a single aggregate score. A dataset might have 99% valid email addresses but only 85% valid phone numbers. Knowing which specific fields have validity problems tells you where to focus remediation.
Sohovi runs format and pattern checks on your columns automatically when you upload a CSV — surfacing which fields have values that don't match the expected pattern and how many records are affected.
Enforcing Validity at the Source
The most cost-effective validity control is preventing invalid values from entering your system in the first place. Every invalid value caught at entry costs you nothing to fix. Every invalid value discovered after it's propagated through your systems, been used in reports, and influenced decisions costs far more.
Use structured inputs instead of free text wherever possible: Date pickers instead of date fields where users type dates. Dropdown menus instead of free-text status fields. Phone number fields with format masks. The less free text you accept, the fewer validity problems you encounter.
Add format validation to forms: Most CRM platforms and form builders support regular expression validation. An email field with a regex check for the @ symbol catches the majority of email format errors at entry.
Use database constraints for critical fields: NOT NULL constraints for required fields, CHECK constraints for range validations, foreign key constraints for referential integrity. These enforce validity rules at the database level and reject violations before they enter your data.
Validate import files before importing: Before importing any external data file — from a vendor, from a trade show, from an old system — run a validity check on the columns you'll be importing. Catching format problems before import is far cheaper than cleaning them after.
Building a Validity Baseline
Most teams don't know their current validity rates because they've never measured them. The first step is establishing a baseline:
- Export your most important dataset as a CSV
- Profile it with a tool that checks format and pattern validity
- Document the validity rate for each key field
- Prioritize the fields with the lowest validity rates
- Fix the top issues and re-measure monthly
Once you have a baseline, you can track whether validity is improving or declining over time — which tells you whether your prevention measures are working.
If you're ready to see the validity rates for your most important data, Sohovi will give you an instant quality report. Upload your first CSV free — no credit card, no setup, no data leaves your machine.