This article explains the overview of data completeness and how it is measured in behavior analytical models.
What is data completeness?
Data completeness is an essential metric of data quality in addition to consistency, validity, uniqueness, timeliness, integrity, and accuracy. The data is complete without missing information or duplicates.
Why is data completeness required?
Resolution Intelligence Cloud supports data completeness in behavior analytics to ensure that the predictions are accurate and the overall performance of the ML models is improved.
With data completeness, you can gain enhanced transparency in your behavioral analytical models and detect data ingestion impediments throughout the data pipelines.
How do you measure data completeness in behavior analytics?
Measuring data completeness depends on two factors:
- Model aggregation interval
- Data load type (incremental or initial load)
Three practical methods of measuring data completeness are implemented in behavior analytics as follows:
Daily Analysis
When you set the aggregation level to daily, hourly, or a specific hour in a day with incremental data load, the model considers the data for a day and calculates the completeness using the formula.
Completeness = (Ceiling Hour of the event / 24) * 100
As the formula states, the model considers only non-empty columns with the last event recorded on a specific day and takes the ceiling value of the hour at which the last event occurred.
Averaged Analysis
For the initial load or initial hard filter load, with the aggregation levels set to weekly, day of the week, or monthly, the model considers the data for each day and averages data completeness values based on the aggregation levels.
In addition to daily analysis as described above, the model takes the average score of data completeness values across the selected intervals and gives you the final score.
Overall data completeness score = (Sum of daily values / Interval days)
Comments
0 comments
Please sign in to leave a comment.