Data Cleaning Rules
The following transformations are applied to uploaded Excel files to ensure standardized, clean, and anonymized output:
🔄 General Cleaning
- Remove unnamed columns or empty rows.
- Remove duplicate rows and columns.
- Normalize header names to ensure consistent schema.
🛡️ Sensitive Data Sanitization
- Replace
NHIcolumn with anonymizedID(persisted across sheets). - Convert
DOBto calculatedAge, then removeDOB. - Remove
AddressandContactcolumns.
⚙️ Transformation Logic
- Multi-sheet alignment: The same
IDmaps across sheets for the same patient. - Bias-detection rules for demographics (optional future extension).
- File-specific rules applied for:
- Case-Mix
- Holistic
- Fare-up
- Outpatient