Skip to main content

Pattern Matching

MergeGuide uses a dual-layer detection engine: regex for fast known-pattern matching, and Semgrep for AST-based taint analysis and data flow tracking. Built-in policies use both layers. Custom policies can use regex patterns (and, in some configurations, Semgrep rules).

Detection Engine Layers

LayerTechnologyBest For
Layer 1RegexKnown patterns, string matching, fast scanning
Layer 2Semgrep AST taint analysisData flow, injection vulnerabilities, language-aware detection
When you write a custom policy using type: regex, you’re writing a Layer 1 pattern. The built-in policies for injection vulnerabilities (SQL, XSS, command injection) leverage Semgrep’s taint analysis in Layer 2. MergeGuide supports multiple pattern types for different use cases.

Regex Patterns

Regular expressions are the most common pattern type.

Basic Syntax

patterns:
  - type: regex
    value: "console\\.log"
    message: "console.log detected"

Capturing Groups

Use groups to provide context in messages:
patterns:
  - type: regex
    value: "(password|secret|key)\\s*=\\s*['\"]([^'\"]+)['\"]"
    message: "Hardcoded $1 detected with value partially shown"

Common Patterns

Hardcoded Secrets

# API keys
value: "(api[_-]?key|apikey)\\s*[=:]\\s*['\"][a-zA-Z0-9_-]{20,}['\"]"

# AWS keys
value: "AKIA[0-9A-Z]{16}"

# Generic secrets
value: "(secret|password|passwd|pwd)\\s*[=:]\\s*['\"][^'\"]{8,}['\"]"

Security Issues

# eval usage
value: "\\beval\\s*\\("

# SQL injection
value: "(SELECT|INSERT|UPDATE|DELETE)[^;]*\\$\\{[^}]+\\}"

# XSS
value: "innerHTML\\s*=\\s*[^;]*\\$\\{"

Code Quality

# TODO comments
value: "//\\s*TODO:?"

# Console statements
value: "console\\.(log|debug|info|warn|error)\\s*\\("

# Debugger statements
value: "\\bdebugger\\b"

Regex Flags

patterns:
  - type: regex
    value: "todo"
    flags: "i"  # Case insensitive: matches TODO, Todo, todo
FlagDescription
iCase insensitive
mMultiline (^ and $ match line boundaries)
sDot matches newline

Negative Patterns

Exclude certain contexts using negative lookahead:
# Match console.log but not in comments
value: "^(?!\\s*//).*console\\.log"

# Match password but not in test files
value: "password(?!.*\\.test\\.)"

AST Patterns

Abstract Syntax Tree patterns understand code structure.

JavaScript/TypeScript AST

patterns:
  - type: ast
    language: javascript
    value: |
      CallExpression[callee.name="eval"]
    message: "eval() call detected"

Python AST

patterns:
  - type: ast
    language: python
    value: |
      Call[func.id="eval"]
    message: "eval() call detected"

AST Query Syntax

MergeGuide uses a CSS-like selector syntax for AST queries:
NodeType[attribute="value"]
NodeType > ChildNodeType
NodeType DescendantNodeType
Examples:
# Function with specific name
FunctionDeclaration[id.name="dangerousFunction"]

# Method call on specific object
CallExpression[callee.object.name="document"][callee.property.name="write"]

# Any throw statement
ThrowStatement

# Import from specific package
ImportDeclaration[source.value="lodash"]

AST Benefits

  • Structure-aware: Won’t match code in strings or comments
  • Language-specific: Understands language semantics
  • Precise: Can target specific code constructs

AST Limitations

  • Requires parsing (slower than regex)
  • Language-specific patterns needed
  • More complex to write

Semantic Patterns

High-level patterns that detect code behaviors.

Available Semantic Patterns

patterns:
  - type: semantic
    value: sql-string-concatenation
    message: "Potential SQL injection"

  - type: semantic
    value: hardcoded-credential
    message: "Hardcoded credential detected"

  - type: semantic
    value: insecure-random
    message: "Insecure random number generation"

Semantic Pattern List

PatternDescription
sql-string-concatenationSQL built with string operations
hardcoded-credentialSecrets in source code
insecure-randomMath.random for security
missing-input-validationUnvalidated user input
unsafe-deserializationDeserializing untrusted data
path-traversalFile path from user input
command-injectionShell commands with user input
open-redirectRedirect URL from user input

Multi-Pattern Policies

Combine multiple patterns:
patterns:
  # Pattern 1: Direct eval
  - type: regex
    value: "\\beval\\s*\\("
    message: "Direct eval() usage"

  # Pattern 2: new Function
  - type: regex
    value: "new\\s+Function\\s*\\("
    message: "new Function() is equivalent to eval"

  # Pattern 3: setTimeout with string
  - type: ast
    language: javascript
    value: |
      CallExpression[callee.name="setTimeout"][arguments.0.type="Literal"]
    message: "setTimeout with string argument acts like eval"

Pattern Context

Line Context

Include surrounding lines for context:
patterns:
  - type: regex
    value: "TODO"
    context:
      before: 2
      after: 2

File Context

Apply patterns based on file location:
patterns:
  - type: regex
    value: "console\\.log"
    files:
      - "src/**"
      - "!src/**/*.test.*"

Performance Tips

  1. Order matters: Put fast regex patterns before slow AST patterns
  2. Be specific: Narrow file patterns reduce scanning
  3. Avoid backtracking: Use atomic groups in complex regex
  4. Cache results: Patterns are cached per file

Regex Optimization

# Slow: excessive backtracking
value: ".*password.*"

# Fast: anchored and specific
value: "password\\s*="

Testing Patterns

Test Mode

# Test pattern against file
mergeguide check --policy policy.yaml --test-pattern "console\\.log" file.ts

# Show all matches with context
mergeguide check --policy policy.yaml --verbose --show-matches

Pattern Playground

Use the dashboard pattern tester:
  1. Go to Policies > Create Policy
  2. Enter pattern in the Test tab
  3. Paste sample code
  4. See matches highlighted in real-time