In this effort, given Fast Healthcare Interoperability Resources (FHIR) is an industry-standard for normalizing healthcare data that can be used for AI/ML model development, we compared two implementation approaches:
1. Next-Gen Connect
2. Custom Python Script to achieve the outcome.
The data includes 55k patient records consisting of 360 million observations, 21 million medication administrations, 4 million medication requests, 120k encounters, 2 million diagnostic reports, and 250k procedures that are converted into an FHIR based target canonical structure.
Data Source and Format: Input data is in relational structure and does not contain any personally identifiable information (PII) or personal health information (PHI) and is provided to us in CSV format.
Utilized Data Pre-processing: All large files greater than 50 Gb were divided into smaller files by each patient for faster search, access, load time, and memory utilization during processing.
Compute infrastructure: Utilized elastic cloud computes and storage utilities such as Big query and Healthcare API on google cloud platform (GCP). This effort can also be replicated on AWS and Azure.