EHR Data

using FHIR

EHR Data Normalization Using FHIR Standards 

The Challenge

In this effort, given Fast Healthcare Interoperability Resources (FHIR) is an industry-standard for normalizing healthcare data that can be used for AI/ML model development, we compared two implementation approaches:
1. Next-Gen Connect
2. Custom Python Script to achieve the outcome.

The data includes 55k patient records consisting of 360 million observations, 21 million medication administrations, 4 million medication requests, 120k encounters, 2 million diagnostic reports, and 250k procedures that are converted into an FHIR based target canonical structure. ​



Data Source and Format: Input data is in relational structure and does not contain any personally identifiable information (PII) or personal health information (PHI) and is provided to us in CSV format.


Utilized Data Pre-processing: All large files greater than 50 Gb were divided into smaller files by each patient for faster search, access, load time, and memory utilization during processing.


Compute infrastructure: Utilized elastic cloud computes and storage utilities such as Big query and Healthcare API on google cloud platform (GCP). This effort can also be replicated on AWS and Azure.


Data mapping and Transformation: Transformed the data to FHIR resources by performing mapping and transformations of values from input relational structure to standard fields in FHIR (STU3).​


FHIR Validation: Converted data is validated using pre-built python packages for consistency and correctness of the format​.
Similar operations were performed using Nextgen connect by performing mapping using JavaScript.​


Advantages of Script​
Custom Script approach can be faster than working with a commercial tool when data engineering teams are not familiar with the software product.​
FHIR resource validation is relatively easier with custom script by leveraging certain open-source packages or by creating our own packages.​
Custom scripts have the flexibility to add new features as needed while the tool may be limited to existing features and the steep learning curve involved. ​
Advantages of Tool​​
The commercial product provides functionality to handle various data challenges through easy to use GUI capabilities making it Palatable for non-technical functional users.​
With help of documentation and instructions, data transformations and mapping activities can be easy to replicate within the tool compared to a proprietary custom script.


Interoperability is clearly a challenge that the healthcare industry has been trying to solve since the implementation of EHR systems. There are several open-source technical solutions available that have adopted interoperability standards such as FHIR to normalize and prepare data for AI and ML Model development. However, data mapping and transformation is a crucial step that requires clinical domain expertise from hospitals where the data was created because of the Idiosyncrasies pertinent to that hospital.