PDF Parsing API

Summary

PDF Parsing allows you to extract the data from PDF format bank statements and transaction confirmations. Depending on the document you wish to parse, you select the appropriate endpoint and upload the encoded document. After uploading the document to our API, our system extracts key data to a JSON or XML format for easy processing. Additionally, our parsing algorithms verify the credibility and integrity of the information contained within the documents.

How to integrate

Integration of the PDF Parsing is as easy as it can get. You just need to upload a bank statement in the PDF format encoded using “Base64” (plain text) to one of our API endpoints. As a response you’ll receive the data extracted from the document.

See our documentation for more details:

Extracting data

We can only extract as much information as is available in a given document, which differs across all the banks. In most cases, the data we can extract includes:

From statements (monthly statements and account history documents):

From transaction confirmations:

For more detailed information on the data returned, please refer to our Kontomatik Services & Coverage document, which specifies the data available in each supported bank.

Parsing different document types

Document type

Statements

Transaction confirmations

Supported countries

Poland, Latvia

Poland

Supported types of documents

  • Monthly statement
  • Account history
  • Confirmations of regular transfers,
  • Confirmations of BLIK payments

Features

  • Extracting KYC data, accounts and transactions
  • Verifying possible document modification (fraud detection)
  • Returning transactions labels (optional)
  • Getting owner score (optional)
  • Extracting Sender and Recipient information
  • Determining the status of a transaction and who generated it
  • Extracting transaction details
  • Verifying possible document modification (fraud detection)
  • Returning transaction labels (optional)

Not supported

  • statements from company/corporate accounts (other than sole traders i.e. “Jednoosobowa działalność gospodarcza”),
  • statements in English or other languages (the layout needs to be in Polish),
  • encrypted statements,
  • scans or otherwise prepared documents not downloaded directly from online banking websites.
  • All other types of confirmations

Response format

XML

JSON

Ability to operate on parsed data

Yes

No, extracted data from the confirmation is returned only immediately after parsing, it won’t be available in Insight and you won’t be able to use any features based on aggregated data like Data Analysis (e.g. Scoring or Data Summary) or Data processing.

Anti-tampering verification

Verification aims to hinder malicious attempts at tampering with PDF documents. Our algorithms verify:

It’s not guaranteed that checking these properties is enough to spot fraud, but based on our analysis these are the most common indicators.

Sometimes, someone might accidentally edit the PDF document. In such situations, it’s best to ask the end-user to download the document again and not to open it before uploading it to the parser.

PDF parsing Widget

Please note that the PDF Parsing Widget is available for statements. Transaction confirmations can be parsed only via API.

Our PDF Parsing Widget is a front-end widget that makes the process of parsing the bank statement even more seamless. It is available as part of our SaaS solution.

The end-user chooses the bank and uploads bank statements as a PDF file. You can then get the extracted information using our data endpoint or review it in Insight.

On-premise deployment

Our PDF parsing solution is mainly used in the SaaS model, but we also offer on-premises deployments. It is designed for clients who don’t want the data to go through other servers than their own.

This option does however have some disadvantages:

To find out more about this solution, contact our Sales team.

FAQ

PDF parsing is a general term to describe a class of methods that are able to extract plain text from PDF documents that is human-readable.

Our proprietary PDF Parsing solution is designed specifically for handling bank documents. It automatically recognises transactions, KYC and other related data in a bank statement and extracts them for further processing.

Moreover, our algorithm always tries to verify if the statement wasn’t edited, the data is consistent, the digital signature is correct and other expected features are in place so that you can protect yourself from accepting statements that have been tempered with.

It depends on what is important to you and your end-users. In both cases, you will get the data in the same format. The main difference is that the PDF parsing doesn’t require the user to login to the online bank via our widget. As a result, the end-user has more flexibility on how they obtain the bank statement.

On the other hand, the Banking API combined with our widget makes the whole process of providing the data by the end-user easier.

Finally, the PDF Parsing solution can work entirely from your servers (on-premise) in contrast to the Banking API which is served only via the cloud (SaaS)*.

*On-premise PDF Parsing solution requires separate contracts and fees to the SaaS version. We recommend it only to big companies with highly developed infrastructure and security IT teams.

To parse a PDF bank statement, you need to upload it to the statement endpoint encoded using “Base64” to get the results back in the XML format.

If you choose our SaaS solution, you can also use our front-end widget to upload a pdf without encoding and get the XML later using the data endpoint.

See all FAQs

Documentation

For technical documentation, refer to our unified documentation that offers comprehensive support for customers integrating with AIS services, our PDF parser, and Data Analysis solutions. Discover detailed guidance on seamless integration with Kontomatik services and explore their full range of capabilities.

Contact

Sales

Do you need help in explaining our products, costs, and cooperation?

Technical Support

Do you have technical questions about our services or API integration?