An Introduction to Matcha

A new data-driven programming language

Matcha programming language logo

What is Matcha?

I guess I'm a good person to answer this question since I invented Matcha. Matcha is a completely new programming language that I began developing earlier this year (2025 if you are reading after that). It is still developing and changing rapidly, and as of writing we are on Matcha version 0.0.6. The intention is to do a limited public beta release of Matcha at version 0.1.0. The Matcha interpreter is written in Go, but the Matcha language follows a similar philosophy to Python, even if it operates a lot differently.

Why does anyone need yet another programming language, isn't Python enough?

As one of Python's biggest fans I wholeheartedly agree that Python is more than enough for most things. The thing that is different about Matcha isn't that it's a programming language, but that it uses a different programming paradigm, the data-driven paradigm.

Matcha is a specialized pattern matching language designed for text processing and transformation. Unlike traditional imperative programming languages that focus on explicit instructions and control flow, Matcha embraces a data-driven paradigm where the structure and content of the input data guide program execution.

At its core, Matcha operates on a simple yet powerful principle: find patterns in text and perform actions when they match. This approach creates elegant, declarative programs that focus on describing what to look for rather than how to process it.

The Data-Driven Paradigm

Traditional programming follows a control-driven approach: the programmer explicitly defines the execution path through conditionals, loops, and function calls. In contrast, Matcha's data-driven paradigm lets the data itself determine what happens next.

Key characteristics of Matcha's data-driven approach include:

  1. Pattern-Based Execution: Rather than sequential execution, Matcha programs consist of pattern-action pairs. The runtime scans the input and executes actions when their associated patterns match.
  2. Declarative Style: Programs describe what to find rather than how to find it. The language runtime handles the detection and matching strategy.
  3. Automatic Position Tracking: The interpreter automatically manages position, line, and column tracking as it processes the input.
  4. Context-Aware Processing: Patterns can be sensitive to their context (like a document beginning using `^`), allowing for more nuanced matching.

Matcha Program Structure

A Matcha program consists of a series of pattern blocks, each containing:

  1. A regular expression pattern to match
  2. One or more actions to execute when the pattern matches

Here's the basic syntax:

pattern <regex_pattern>
    <action> <parameters>
    <action> <parameters>
    ...
    ```
            

And here's the Hello World program:

pattern ^
    print "Hello World!"
            

It's worth noting that there are a few differences between regular regex and Matcha regex, for instance the ^ character matches the very start of the document (a little different from standard regex where it matches the start of a line). The print action is then triggered and the matcha interpreter outputs Hello World!

The program is run from the command line like this:

> matcha hello.matcha data.txt
            

For this particular program it doesn't matter what is actually in the file data.txt (which could equally be data.xml, data.csv, data.json, data.log etc. etc.) - the start of that document will be matched whatever it contains and the print action triggered.

In the case of hello.matcha being a more complicated program, it is hello.matcha that defines the pattern-action groups, but it is the order and structure of the data in data.txt that controls the way in which those rules are executed, including but not limited to the order of execution.

Available Actions

Matcha supports several actions that can be performed when a pattern matches. Here are some common actions:

  • print: Outputs text, with optional stream substitution
  • prints: Prints to a stream rather than to the screen
  • echo: Outputs the matched text directly
  • echos: Outputs the matched text directly to a stream
  • set: Assigns a value to a stream
  • transform: Transforms the contents of a stream or matched pattern in one of several ways
  • comment: Documents the purpose of a pattern (not executed)

Streams and Special Values

It may not look like it at first glance, but Matcha is a strongly typed language. There is only one type, which is a stream. A stream holds a sequence of data and appears similar to a string in other programming languages, but it has enhanced features specific to Matcha. The name of every stream begins with a $ symbol

Matcha provides several built-in streams, such as the complete text that matched the pattern $MATCH, the current line number $LINE, the current column number $COLUMN, the Matcha version $VERSION and so on. Capture groups from regular expressions can be accessed with $1, $2, etc.

By convention built in streams are $UPPERCASE while user streams are $lowercase, though the language does not enforce this. Letters, numeric digits and underscores are the only characters permitted in stream names.

Example - Simple Document Metadata Extraction

Here is a simple Matcha program to extract title, author and year of publication from a document header:

pattern ^
    print "Document Analysis:$NL"

pattern Title:\s*(.+)$
    set docTitle = $1
    print "Title: $docTitle$NL"

pattern Author:\s*(.+)$
    set docAuthor = $1
    print "Author: $docAuthor$NL"

pattern Date:\s*(\d{4})-(\d{2})-(\d{2})$
    print "Publication Year: $1$NL"
    print "Publication Month: $2$NL"

The Power of Data-Driven Programming in Matcha

The data-driven approach of Matcha offers several advantages:

  1. Simplified Logic: No need for complex control structures or state management - patterns and actions are all you need.
  2. Focus on Content: Programs center on the structure and content of the data, not algorithmic details.
  3. Declarative Clarity: Matcha programs read almost like specifications of what to find, making them easier to understand.
  4. Compact Solutions: Complex text processing tasks can often be expressed in just a few lines.
  5. Natural Parallelism: The pattern-based approach lends itself to potential parallel processing optimizations.

Use Cases for Matcha

Matcha is designed to excel in scenarios such as:

  • Data engineering
  • Log file analysis
  • Data extraction and transformation
  • Code generation
  • Document processing
  • Content enrichment
  • Parsing tasks
  • Text-based data mining
  • Data notation conversion

Conclusion

Matcha represents a refreshing departure from traditional programming paradigms. By embracing a data-driven approach with pattern matching at its core, it offers a unique and powerful way to process textual data. Whether you're analyzing logs, extracting information, or transforming content, Matcha's declarative style allows you to express your intent concisely and clearly. As you explore Matcha, you'll discover the elegance of letting the data itself guide your program's execution path, freeing you to focus on what you want to find rather than how to find it.