Skip to main content

Assign categories automatically using regex string matching

Yokoy Invoice allows you to automatically assign categories to invoice line items depending on a specific string pattern identified in the extracted invoice data.

Written by Yokoy Team
Updated over 2 months ago

You can use regular expressions to detect information on invoices that can be used to automatically assign categories to invoice line items. This improves automation and provides greater flexibility in customizing extraction logic.

For example, you may want to code all invoice line items with the correct expense category where Yokoy detects a specific code on the invoice.

⚠️ Caution

A regex rule extracts a single reference from an invoice and uses it to assign the same category to all line items. It doesn’t extract a separate reference for each line item.

Regex string matching takes precedence over any other assignment logic, including smart coding. For example, if a regex is defined for categories but no invoice references match the pattern, or Yokoy is unable to find a corresponding category, then the category field is updated using other active logic such as smart coding or default values.

Perk automatically compares the regex-extracted reference to all values of the selected attribute on active categories in the company. When it finds a match, it assigns the matching category to all line items. If multiple matches are found, then the category field in the line items of the invoice is left empty.

Only exact matching of values is allowed; no fuzziness. This feature only works with PDF invoices; XML-based e-invoices are not supported.

Multiple regex rules can be defined for categories. If multiple rules match values for the same invoice, no category is assigned and the field is left empty.

Creating a string-matching rule to assign categories

To create a category regex rule:

  1. Go to Invoice business rules > String matching (Regex) tab.

  2. Click the Add regex button.

To create a new rule, enter the following information:

Field

Description

Regex syntax

Regex expressions is a special type of language that allows us to recognize certain patterns.
Let’s say that you have a book and in this book you want to find the word that contains the numbers 567 followed by a certain alphabetical number of letters, like APPLE (the string would be 567APPLE). Regex allow you to write a certain syntax pattern to extract exactly that exact string pattern from the data.


You can test the pattern directly from the setup. It checks that the pattern contains correct regular expression syntax.

Matched object attribute

Specifies the data field Yokoy matches against the string extracted by the regex from the invoice. Currently, you can only match against specific field values in Category:

  • Category account (ERP code)

  • Category name

  • Custom field (any custom fields that may have been set up for your company in the category)

Target invoice field placement

Specifies where the target field is located. In this case, select Line item.

Target invoice field

Indicates the invoice field to which the matched value is assigned. In this case, it should be Category.

Click Save to keep the rule without activating, or Activate to apply the rule to any new invoice uploaded.

Possible setup for assigning a category

You receive invoices from multiple suppliers and want each invoice line item to be automatically assigned to the correct GL account. Each invoice contains a unique reference identifying the GL account, such as 6000 for office expenses, 6100 for professional services, 6200 for subcontractor costs. This ERP code appears within the invoice document itself and follows a specific format: GL + space + four digits.

You set up a rule to:

  1. Define a recognition pattern for this reference using a regex rule: \bGL \d{4}\b, which uses boundaries to prevent matching with longer sequences.

  2. Select the Category account (ERP code) data field that should be matched – in this case, the Category.

Regex rule to extract GL 2010

Categories with Account (ERP)

For every newly uploaded invoice (whether uploaded manually or via email), Yokoy scans the PDF to detect a string that matches the specified regex pattern. When a match is found, Yokoy compares it against the ERP codes stored in the category records and automatically assigns the category whose ERP code matches the detected value (e.g. GL 2100) to all line items for that invoice.

Did this answer your question?