PDF Parsing API

Summary

PDF Parsing allows you to extract the data from PDF format bank statements and transaction confirmations. Depending on the document you wish to parse, you select the appropriate endpoint and upload the encoded document. After uploading the document to our API, our system extracts key data to a JSON or XML format for easy processing. Additionally, our parsing algorithms verify the credibility and integrity of the information contained within the documents.

How to integrate

Integration of the PDF Parsing is as easy as it can get. You just need to upload a bank statement in the PDF format encoded using “Base64” (plain text) to one of our API endpoints. As a response you’ll receive the data extracted from the document.

See our documentation for more details:

Extracting data

We can only extract as much information as is available in a given document, which differs across all the banks. In most cases, the data we can extract includes:

From statements (monthly statements and account history documents):

KYC data,
accounts,
transactions

From transaction confirmations:

status of the transaction,
who generated the confirmation,
sender/recipient information,
transaction details (amount, title, kind etc.)

For more detailed information on the data returned, please refer to our Kontomatik Services & Coverage document, which specifies the data available in each supported bank.

Parsing different document types

Document type	Statements	Transaction confirmations
Supported countries	Poland, Latvia	Poland
Supported types of documents	Monthly statement Account history	Confirmations of regular transfers, Confirmations of BLIK payments
Features	Extracting KYC data, accounts and transactions Verifying possible document modification (fraud detection) Returning transactions labels (optional) Getting owner score (optional)	Extracting Sender and Recipient information Determining the status of a transaction and who generated it Extracting transaction details Verifying possible document modification (fraud detection) Returning transaction labels (optional)
Not supported	statements from company/corporate accounts (other than sole traders i.e. “Jednoosobowa działalność gospodarcza”), statements in English or other languages (the layout needs to be in Polish), encrypted statements, scans or otherwise prepared documents not downloaded directly from online banking websites.	All other types of confirmations
Response format	XML	JSON
Ability to operate on parsed data	Yes	No, extracted data from the confirmation is returned only immediately after parsing, it won’t be available in Insight and you won’t be able to use any features based on aggregated data like Data Analysis (e.g. Scoring or Data Summary) or Data processing.

Anti-tampering verification

Verification aims to hinder malicious attempts at tampering with PDF documents. Our algorithms verify:

Bank’s digital signature
Consistency of the account balance with transactions
PDF metadata characteristic
Fonts, color and size
Bank’s logotype
Consistency of the header period with transaction dates
Keywords
Document structure

It’s not guaranteed that checking these properties is enough to spot fraud, but based on our analysis these are the most common indicators.

Sometimes, someone might accidentally edit the PDF document. In such situations, it’s best to ask the end-user to download the document again and not to open it before uploading it to the parser.

Please note that the PDF Parsing Widget is available for statements. Transaction confirmations can be parsed only via API.

Our PDF Parsing Widget is a front-end widget that makes the process of parsing the bank statement even more seamless. It is available as part of our SaaS solution.

The end-user chooses the bank and uploads bank statements as a PDF file. You can then get the extracted information using our data endpoint or review it in Insight.

On-premise deployment

Our PDF parsing solution is mainly used in the SaaS model, but we also offer on-premises deployments. It is designed for clients who don’t want the data to go through other servers than their own.

This option does however have some disadvantages:

you need to maintain infrastructure and security all on your own,
it’s not updated as frequently as the SaaS solution and you’re in charge of the installation,
in case any bugs arise, debugging becomes much harder and as a result, it might take longer for our developers to fix the problems.

To find out more about this solution, contact our Sales team.

FAQ

PDF parsing is a general term to describe a class of methods that are able to extract plain text from PDF documents that is human-readable.

Our proprietary PDF Parsing solution is designed specifically for handling bank documents. It automatically recognises transactions, KYC and other related data in a bank statement and extracts them for further processing.

Moreover, our algorithm always tries to verify if the statement wasn’t edited, the data is consistent, the digital signature is correct and other expected features are in place so that you can protect yourself from accepting statements that have been tempered with.

It depends on what is important to you and your end-users. In both cases, you will get the data in the same format. The main difference is that the PDF parsing doesn’t require the user to login to the online bank via our widget. As a result, the end-user has more flexibility on how they obtain the bank statement.

On the other hand, the Banking API combined with our widget makes the whole process of providing the data by the end-user easier.

Finally, the PDF Parsing solution can work entirely from your servers (on-premise) in contrast to the Banking API which is served only via the cloud (SaaS)*.

*On-premise PDF Parsing solution requires separate contracts and fees to the SaaS version. We recommend it only to big companies with highly developed infrastructure and security IT teams.

To parse a PDF bank statement, you need to upload it to the statement endpoint encoded using “Base64” to get the results back in the XML format.

If you choose our SaaS solution, you can also use our front-end widget to upload a pdf without encoding and get the XML later using the data endpoint.

See all FAQs

PDF Parsing API

Summary

How to integrate

Extracting data

Parsing different document types

Anti-tampering verification

PDF parsing Widget

On-premise deployment

FAQ

Documentation

Contact

Sales

Technical Support

PDF Parsing API

Summary

How to integrate

Extracting data

Parsing different document types

Anti-tampering verification

PDF parsing Widget

On-premise deployment

FAQ

What is PDF Parsing?

Why would I choose PDF Parsing over the Banking API?

How can I parse a PDF bank statement?

Documentation

Contact

Sales

Technical Support