Optimized Digital Image Processing (DIP) Approach to Solve Page Rotation Problem: Should We Always Rush to Deep Learning?

Optimized Digital Image Processing Approach to Solve Page Rotation Problem

The Challenge

Documents often arrive in varying orientations, creating problems in automatic processing and data extraction. While many documents adhere to standardized formats, like invoices or intake forms, the inconsistency in orientation disrupts document classification, data extraction, and validation workflows.

This issue is typically caused by human error in scanning, varying departmental standards, or automated systems mishandling document preparation. Current solutions, including deep learning-based models and brute force methods, have demonstrated inefficiency or excessive computational costs.

Approach

We evaluated and compared three initial methods before developing a novel, optimized solution for document rotation detection. Here’s a five-step summary of the progression from brute force to our final solution:

Brute Force Approach: In this method, OCR is applied to each document four times, once for each possible orientation (North, South, East, West). While this method extracts the text, it is computationally inefficient, especially for large document volumes, due to the repeated OCR operations.

Deep Learning Model Approach: We trained multiple deep learning models to detect document orientations, achieving up to 94% accuracy. However, this method required multiple inferences—one for the deep learning model and another for OCR—resulting in high computational costs and longer processing times.

DFTOP Approach (Double Fourier Transform Optimized Process): This approach utilizes Fourier Transforms (FFT) to analyze text frequency patterns, coupled with OCR for text extraction. This combination reduces the number of OCR operations, improving both efficiency and scalability while maintaining accuracy.

Observations and Insights: We observed distinct rotation patterns in human-scanned documents. These patterns enabled us to anticipate likely rotations based on user behavior, which informed the development of a more efficient detection method that reduces the need for multiple OCR passes.

No Test

Implementation Overview: The final approach uses Fourier Transforms for rotation detection, supported by Parseval’s theorem for signal processing, which allows us to simplify computations and determine the correct document orientation before running OCR. This hybrid technique minimizes computational costs and improves throughput.

Example:

Consider a 4-page PDF document with different orientations for each page:

*Page 1: 0 degrees (expected orientation)
*Page 2: 90 degrees
*Page 3: 180 degrees
*Page 4: 270 degrees

The following table outlines the validation accuracy of several deep learning models used:

Optimized Digital Image Processing Approach to Solve Page Rotation Problem | Rediminds-Create the Future

Implementation Details of Our Approach (DFTOP)

*Fourier Transform: Used to analyze the frequency patterns in document text, allowing us to detect correct orientation without needing OCR multiple times.

*Parseval’s Theorem: Applied for signal processing to simplify computations while detecting document alignment through frequency analysis.

We rely on Parseval’s theorem for the Fourier analysis and signal processing. Mathematically, for a continuous function (f(t)) with its Fourier transform (F(omega)), Parseval’s theorem can be expressed as:

Steps Involved

1.Preprocessing: Documents are pre-processed using Fourier Transforms to identify key features.
2.Rotation Detection: The most probable orientation is calculated using frequency data.
3.Validation: Once the correct orientation is determined, OCR is run a single time on the correctly oriented document.

The Probability Distribution over different orientations/angles of the document pages:

This figure shows a common pattern expected from humans using the system, with one orientation (usually the correct one) having a higher probability. And there’s some bias in the missed pages so one orientation is more likely to happen than the other two.

The Algorithm

The following figure shows:

Path A: When the document is in the correct orientation, we can see how the horizontal Fourier transform graph has low power.
Path B: When the document is not in the correct orientation (Rotated by 90 degrees in this path), We can see a very high power for the horizontal Fourier transform graph.

This figure above illustrates the algorithm’s capability to detect text alignment through a series of methodical steps:

Step 1 – Grayscale Conversion: Convert each document page to grayscale for normalization.

Step 2 – Resizing: Resize to a standardized dimension (720×720 pixels).

Step 3 – Normalization: Generate vertical and horizontal components by normalizing the image.

Step 4 – Horizontal Frequency Analysis: Apply Horizontal Frequency Analysis using Fourier Transform to extract frequency components.

Step 5 – Vertical Frequency Analysis: Apply Vertical Frequency Analysis using FFT for vertical component extraction.

Step 6 – Correct Document Orientation: Based on the frequency distribution, determine the correct document orientation.

When the text is correctly aligned, the vertical frequency component exhibits higher energy compared to the horizontal component. This phenomenon occurs because the spacing between lines and the lines themselves create a sine wave-like pattern, which is more pronounced in the vertical frequency domain.

Analysis: Computational Efficiency

These tables will help visualize the efficiency of the different approaches used in document orientation detection.

1. OCR Operations Required by Different Approaches

This table will summarize the number of OCR operations required by each approach based on the probability of different rotations P(rot).

This table shows how the number of OCR operations varies for each approach, and highlights how the DFTOP approach reduces the need for multiple OCR passes.

2. Time Efficiency Comparison

This table compares the execution time required by each approach for a given document processing task, assuming 1 unit of time per OCR operation.

We can clearly see that time advantage of the DFTOP compared with Brute Force is:

Also since the execution time of DFTOP is bounded by 2, it’s always more efficient computationally than the DL approach.

Results

Our approach outperformed both the brute force and deep learning models in terms of computational efficiency and accuracy:

Conclusion

This case study demonstrates how our novel Fourier Transform-based approach solved the problem of document rotation detection with significantly fewer computational resources than traditional methods like brute force and deep learning models. It effectively balances accuracy and efficiency, making it scalable for large document volumes.

Future Work

1.Combining Deep Learning with FFT: Although our FFT-based method solves most page rotation issues, a combination of deep learning models with FFT can enhance performance for certain non-standard rotations.
2.Non-Standard Rotations: Explore solutions for handling irregular document rotations (e.g., non-90-degree rotations).
3.Scalable GPU Implementation: Implementing GPU acceleration can speed up the Fourier Transform processes even further, making the solution faster for larger datasets.

Featured Work

All Data Inclusive, Deep Learning Models to Predict Critical Events in the Medical Information Mart for Intensive Care III Database (MIMIC III)

Download

All Data Inclusive, Deep Learning Models to Predict Critical Events in the Medical Information Mart for Intensive Care III Database (MIMIC III)

Featured Work

Artificial Intelligence and Robotic Surgery: Current Perspective and Future Directions

Download

Artificial Intelligence and Robotic Surgery: Current Perspective and Future Directions

Featured Work

Augmented Intelligence: A synergy between man and the machine

Download

Augmented Intelligence: A synergy between man and the machine

Featured Work

Building Artificial Intelligence (AI) Based Personalized Predictive Models (PPM)

Download

Building Artificial Intelligence (AI) Based Personalized Predictive Models (PPM)

Featured Work

Predicting intraoperative and postoperative consequential events using machine learning techniques in patients undergoing robotic partial nephrectomy (RPN)

Download

Predicting intraoperative and postoperative consequential events using machine learning techniques in patients undergoing robotic partial nephrectomy (RPN)

Featured Work

Stereo Correspondence and Reconstruction of Endoscopic Data Challenge

Download

Stereo Correspondence and Reconstruction of Endoscopic Data Challenge