Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
• Documents Processing
• Amazon Textract
• Overview
• Amazon Textract APIs
• Demo
• Reference Architectures
• How to Get Started
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Documents are important
Primary tool of record keeping, communicating, collaborating, and transacting
Finance Medical
Insurance Legal
Accounting Education
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
16.3M US mortgage applications ($2.1T) in 2016
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
About 240M W2 tax forms will be processed for
FY2018 in the US
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Need for processing documents
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How documents are processed today
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges for processing documents
Manual processing
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges for processing documents
Manual processing
Variable output
Inconsistent results
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges for processing documents
Optical Character Recognition (OCR)
No multi-column detection
No rotated text
detection (not shown)
No stylized font
detection (not shown)
Output
Extract data quickly & No code templates to
accurately maintain
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges for processing documents
Optical Character Recognition (OCR)
Output
Start Date End Date Employer Name Position Held Reason for leaving
1/15/2009 6/30/2013 Any Company Head Baker Family Relocated
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges for processing documents
Rules and template-based extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Challenges for processing documents
Rules and template-based extraction
The well-known W2 US tax form has 100s of variants each year
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It looks easy, but …
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introducing Amazon Textract
Extract text and data from virtually any document
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract features
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract – Text Extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract - Text Extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—Text Extraction API
DetectDocumentText
Relationships CHILD
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DetectDocumentText
Request Response
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—
Text Extraction simplified
Multi-column detection
Output
Extract data quickly & No code or templates to
accurately maintain
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract – Table Extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—Table Extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—Table Extraction API
AnalyzeDocument with “table” as FeatureTypes parameter
Relationships CHILD
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AnalyzeDocument - Tables
Request Response
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—
Table Extraction simplified
Table recognized
Output {
Start Date: 1/15/2009
End Date: 6/30/2013
Employer Name: Any Company
Position Held: Head Baker
Reason for leaving: Family relocated
}
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract – Form Extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—Form Extraction
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—Forms Extraction API
AnalyzeDocument with “forms” as FeatureTypes parameter
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AnalyzeDocument - Forms
Request Response
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract—
Form Extraction simplified
Logical
groupings captured
Relationships captured
Date of Birth:
MM: 01
DD: 01
YYYY: 1971
Gender:
Male: True
Female: False
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Demo
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Sync and async
Supports single-page
documents such
as images (e.g.,
mobile capture)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Asynchronous APIs
• StartDocumentTextDetection
• StartDocumentAnalysis
Request Response
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Asynchronous APIs
• GetDocumentTextDetection
• GetDocumentAnalysis
Request
Response
->
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API Reference – AWS CLI
• DetectDocumentText
aws textract detect-document-text --document '{"S3Object":{"Bucket":"textract-demo-
images", "Name":"simple_text_document.jpg"}}‘
• AnalyzeDocument - Forms
aws textract analyze-document --document '{"S3Object":{"Bucket":"textract-demo-images",
"Name":“employmentapp.png"}}‘--feature-types “FORMS”
• AnalyzeDocument - Tables
aws textract analyze-document --document '{"S3Object":{"Bucket":"textract-demo-images",
"Name":“DenseTextwithTable.png"}}‘ --feature-types “TABLES"
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API Reference – AWS CLI
• StartDocuemtTextDetection
aws textract start-document-text-detection --document-location
'{"S3Object":{"Bucket":"textract-demo-images", "Name":"AfterVisitSummaryExample.pdf"}}'
--notification-channel '{"SNSTopicArn":"arn:aws:sns:us-east-1:<aws-account-
id>:SNSDemoTest","RoleArn":" arn:aws:iam::<aws-account-id>:role/SNSFullAccess"}‘
• GetDocumentTextDetection
aws textract get-document-text-detection --max-results 5 --job-id <job-id>
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Under the hood
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Text Extraction: OCR reimagined
Orientation
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Text Extraction: OCR reimagined
Structure variability
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Text Extraction: OCR reimagined
Document variability
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Segmentation and rectification
Photometric
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Segmentation and rectification
Geometric
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Table and cell detection
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Table and cell detection
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Table and cell detection
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Field name (key) and value Extraction
Output
Full Name:
First: John
Middle: X
Last: Doe
Date of Birth:
MM: 01
DD: 01
YYYY: 1971
Gender:
Male: True
Female: False
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Inferring key/value association
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Inferring key/value association
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beyond OCR: Inferring key/value association
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference Architectures
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference architecture—Index and search documents
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference architecture—Extract for NLP
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reference architecture—Form capture
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Launch customers
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Benefits
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Pricing
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Free Tier
Features Free for first three months
Table Detection
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Textract
Regions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Textract
Preview
LEARN MORE
or
https://pages.awscloud.com/textract-
SIGN UP preview.html
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Resources
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.