Beyond OCR: AI-based Document Processing with AWS Textract

In the ever-evolving landscape of cloud services, businesses and developers are consistently looking for powerful tools that can transform workflows and increase efficiency. One standout service that has received significant attention is Amazon Web Services (AWS) Textract. This article explores the capabilities of AWS Textract, highlighting its features, benefits, and a specific use case that underscores its value in today’s digital ecosystem.

What is AWS Textract?

AWS Textract is a fully managed machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. Unlike simple OCR (Optical Character Recognition) solutions that only read through text, Textract goes a step further by understanding the content’s layout of the content and the relationships between the extracted pieces of information. This capability makes it an indispensable tool for organizations looking to digitize their document workflows without extensive manual effort.

Key features of AWS Textract

– Text and data extraction: Textract can handle a variety of document formats, including forms and tables, accurately extract text and data.

– Handwriting recognition: One of the standout features of Textract is its ability to process handwritten notes, a feature that expands its applicability across industries.

– Form and table recognition: Identifies the structure of forms and tables, allowing for more organized data extraction and analysis.

– Integration and scalability: Easily integrates with other AWS services, providing a scalable solution that can grow with your business needs.

– Security and compliance: As with all AWS services, Textract ensures a high level of security and compliance, protecting your data throughout the extraction process.

Benefits of AWS Textract from business perspective

– Efficiency and time savings: By automating the data extraction process, organizations can save significant amounts of time, freeing up resources for more critical tasks.

– Accuracy and reliability: Using advanced ML models, Textract provides high accuracy in data extraction, reducing errors associated with manual data entry.

– Cost-effective: Reduces the need for manual document processing, resulting in cost savings for organizations.

– Improved data analysis: By digitizing and structuring data, Textract enables deeper data analysis and insight.

Leveraging AWS Textract to improve insurance application processing

Our client, an insurance company, was faced with the challenge of processing handwritten insurance application forms. Traditional OCR solutions, including those from well-known document processing vendors, fell short due to the complex and variable nature of handwriting.

AWS Textract proved to be a good solution. Its sophisticated machine learning algorithms excel at deciphering handwritten content, a task that traditional OCR technologies struggle to do with the needed accuracy.

High accuracy and confidence scores

AWS Textract not only extracts text with remarkable accuracy but also assigns confidence values to each piece of extracted text. This feature is critical for our customer because it allows them to automate the processing of application forms with high confidence levels. Forms that Textract processes with confidence scores that meet or exceed our customer’s threshold are automatically processed, increasing efficiency and minimizing manual review.

    {
      "BlockType": "LINE",
      "Confidence": 99.75729370117188,
      "Text": "Chicago",
      "Geometry": {
        "BoundingBox": {
          "Width": 0.10312779992818832,
          "Height": 0.02201165445148945,
          "Left": 0.36064571142196655,
          "Top": 0.5263997912406921
        },
        "Polygon": [
          {
            "X": 0.3606738746166229,
            "Y": 0.5263997912406921
          },
          {
            "X": 0.4637735188007355,
            "Y": 0.5264415740966797
          },
          {
            "X": 0.4637455642223358,
            "Y": 0.54841148853302
          },
          {
            "X": 0.36064571142196655,
            "Y": 0.5483691692352295
          }
        ]
      },

API response showing Confidence value together with the handwritten input.

Human-in-the-Loop for Validation

For application forms where Textract’s confidence scores are below our client’s desired threshold, a “human-in-the-loop” system is used. These forms are routed to human agents for review, ensuring accuracy across all processed documents. This approach seamlessly combines automated efficiency with the precision of human oversight.

Pre-built UI provided by AWS for demonstration purposes

Going further

The features and capabilities described above are just the beginning. In addition to analyzing documents from a layout perspective, Textract can also ask questions about the content. In other words, it understands the content of the document on a semantic level. There is a lot of potential in this as it allows data to be extracted even when the form type is unknown or varies.

Example of a German and French language form to transfer car license plates. Textract understands the content and provides an appropriate answer.

Key Advantages and Considerations

I see the following advantages and challenges from a operational and technical perspective:

Advantages:

Solid handwriting recognition: In our tests the accuracy was about 20 times better than using a traditional tool
No configuration required: There is no need to configure for a specific form or document structure. As a result, it’s resilient to changes in the layout of the forms it processes
Scalability: Provided by AWS as a managed service on a pay-per-use basis

Challenges:

Not available in all regions: Notably it is not yet available in the AWS Switzerland region. This may be an issue for some applications.
Complex pricing model: Pricing depends on the features used (such as text extraction, table extraction, form extraction) and varies by a factor of 40. Therefore it’s critical to analyze the use case to get reliable pricing information.

Conclusion

The integration of AWS Textract goes beyond traditional OCR capabilities. For our clients, Textract is a transformative tool that enables them to efficiently process handwritten application forms with high accuracy. This example of document processing innovation demonstrates how advanced technology can overcome long-standing industry challenges and set new standards for operational efficiency and customer service.