Skip to content

Upsilon Query Syntax

Upsilon is the query language used in Have I Been Squatted’s Rules Engine. It provides a powerful, expressive syntax for creating detection rules that match domain permutations based on various signals and patterns.

Upsilon uses a Lucene-like query syntax that’s intuitive and easy to read. Queries are composed of field comparisons combined with logical operators.

field:value
field:>value
field:*pattern*

Upsilon tokenizes unquoted values on whitespace. If your value contains spaces, you have two options depending on the operator:

  • quoted strings (exact match + list membership):
title:"Google Maps"
technologies:["Google Analytics" "Google Tag Manager" "Nginx"]
  • escape spaces for wildcard / array operators (*...*, @..., @*...*, @@..., @@*...*):
technologies:@*Google\ Maps*
http_banner:*nginx\ 1.18*

You can also escape delimiter characters in unquoted tokens using a backslash (e.g. \(, \), \[, \], \,, \\).

Combine conditions using AND, OR, and NOT:

field1:value1 AND field2:value2
field1:value1 OR field2:value2
NOT field:value
(field1:value1 OR field2:value2) AND field3:value3

String fields support various matching patterns:

permutation:amazon.com
kind:typosquatting

Contains - Match strings containing a substring:

permutation:*amazon*
http_banner:*nginx*

Starts With - Match strings beginning with a prefix:

permutation:amazon*
dns_ns:ns1*

Ends With - Match strings ending with a suffix:

permutation:*.com
dns_mx:*mail.protection.outlook.com

Check if a field exists (has any value):

_exists_:whois
_exists_:origin_x509.subject

Check if a field does not exist:

NOT _exists_:dns_a

Number fields support comparison operators:

levenshtein_distance:>5 # greater than
levenshtein_distance:>=5 # greater than or equal
levenshtein_distance:<10 # less than
levenshtein_distance:<=10 # less than or equal
levenshtein_distance:=7 # equal to
# High phishing score
classification.phishing:>0.8
# ASN in specific range
geolocation.asn.number:>=10000 AND geolocation.asn.number:<=20000
# Low edit distance
levenshtein_distance:<3

Boolean fields can be matched directly:

origin_x509.is_ca:true
origin_x509.is_ca:false
certificate_transparency.is_precert:true

Date fields support comparison operators and special time-based functions:

# RFC3339 format dates
registration_metadata.registration_date:>"2024-01-01T00:00:00Z"
origin_x509.not_after:<"2025-12-31T23:59:59Z"

Calculate days from current time:

# Registered within last 30 days
registration_metadata.registration_date.days_since:<=30
# Expires in more than 90 days
registration_metadata.expiration_date.days_until:>90
# Certificate valid for at least 365 days
origin_x509.not_after.days_until:>=365
# Certificate issued within last 7 days
origin_x509.not_before.days_since:<=7

Array fields represent lists of values (e.g., DNS records, technologies).

Use @ prefix for “any element contains”:

# Any DNS A record contains "192.168"
dns_a:@*192.168*
# Any technology contains "Google Maps"
technologies:@*Google\ Maps*
# Any technology starts with "Apache"
technologies:@Apache*
# Any nameserver ends with "cloudflare.com"
dns_ns:@*cloudflare.com

Use @@ prefix to require all elements match:

# All nameservers contain "cloudflare"
dns_ns:@@*cloudflare*
# All MX records end with ".google.com"
dns_mx:@@*.google.com

Query array properties:

# Array has more than 5 elements
technologies.len:>5
# Check minimum/maximum values in numeric arrays
dns_a.len:>=2

IP address fields support special operators for network matching:

Use # prefix for exact IP address matching:

# Match exact IPv4 address
dns_a:#192.168.1.1
# Match exact IPv6 address
dns_aaaa:#2001:db8::1

Use # prefix with CIDR notation to match IP addresses within a subnet:

# Match any IP in the 192.168.0.0/16 subnet
dns_a:#192.168.0.0/16
# Match IPv6 subnet
dns_aaaa:#2001:db8::/32
# Match private IP ranges
dns_a:#10.0.0.0/8 OR dns_a:#172.16.0.0/12 OR dns_a:#192.168.0.0/16
# Domain resolves to AWS IP space
dns_a:#54.0.0.0/8 OR dns_a:#52.0.0.0/8
# Certificate SAN includes localhost
origin_x509.san_ip:#127.0.0.1

Check if a field value is in a list of values:

# ASN is one of several suspicious providers
geolocation.asn.number:[12345 67890 13335]
# Country is in high-risk regions
geolocation.country:[CN RU KP]
# Status is active or pending
registration_metadata.status_code:[clientTransferProhibited serverTransferProhibited]

Detect domains hosted on suspicious infrastructure:

# Suspicious ASN with high phishing score
geolocation.asn.number:16509 AND classification.phishing:>0.7
# Domains outside expected geographic regions
NOT geolocation.country:[US CA GB FR DE] AND levenshtein_distance:<5
# Hosted on Cloudflare with suspicious technology
dns_ns:@*cloudflare* AND technologies:@*phishing*

Detect suspicious domain registration characteristics:

# Recently registered with suspicious registrar
registration_metadata.registration_date.days_since:<=30 AND
registration_metadata.registrar:*namecheap*
# Domain expires soon (potential abandonment)
registration_metadata.expiration_date.days_until:<=90
# Suspicious status codes
registration_metadata.status_code:*clientHold* OR
registration_metadata.status_code:*pendingDelete*

Analyze TLS certificate properties:

# Certificate with many Subject Alternative Names
origin_x509.san_dns_count:>50
# Self-signed or suspicious issuer
origin_x509.is_ca:true OR origin_x509.issuer:*self-signed*
# Recently issued certificate
origin_x509.not_before.days_since:<=7
# Certificate includes wildcard DNS entry
origin_x509.san_dns:@**

Query Certificate Transparency log data:

# Domain seen frequently in CT logs
certificate_transparency.occurrences_count:>10
# Precertificate entry
certificate_transparency.is_precert:true
# Recently seen in CT logs (Unix timestamp)
certificate_transparency.last_seen_ts:>1704067200

Detect suspicious DNS configurations:

# Uses Google Public DNS
dns_ns:@*google.com
# No MX records (suspicious for business domains)
NOT _exists_:dns_mx AND levenshtein_distance:<3
# Resolves to private IP space
dns_a:#10.0.0.0/8 OR dns_a:#172.16.0.0/12 OR dns_a:#192.168.0.0/16

Combine multiple signals for high-confidence detection:

# High-confidence phishing detection
(
permutation:*login* OR
permutation:*account* OR
permutation:*verify*
) AND
registration_metadata.registration_date.days_since:<=14 AND
classification.phishing:>0.8 AND
NOT geolocation.country:[US CA GB]
# Suspicious technology stack
technologies:@*WordPress* AND
technologies:@*PHP* AND
origin_x509.issuer:*Let's Encrypt* AND
registration_metadata.registration_date.days_since:<=7

Operators are evaluated in the following order (highest to lowest):

  1. Parentheses ()
  2. NOT
  3. AND
  4. OR

Always use parentheses to make complex queries explicit:

# Explicit precedence (recommended)
(field1:value1 OR field2:value2) AND field3:value3
# Implicit precedence (same as above, but less clear)
field1:value1 OR field2:value2 AND field3:value3

Use precise field comparisons rather than broad wildcards:

# Good - specific pattern
permutation:*-login.com
# Less optimal - too broad
permutation:*login*

Create robust rules by combining multiple indicators:

# Strong signal combination
registration_metadata.registration_date.days_since:<=30 AND
classification.phishing:>0.7 AND
NOT geolocation.asn.number:[15169 13335 16509]

Choose the right operator for your data type:

# Good - uses CIDR for IP ranges
dns_a:#192.168.0.0/16
# Less optimal - uses wildcards for IPs
dns_a:*192.168*

Start with conservative conditions and refine based on results:

# Start conservative
classification.phishing:>0.9 AND registration_metadata.registration_date.days_since:<=7
# Refine based on false positives/negatives
classification.phishing:>0.8 AND registration_metadata.registration_date.days_since:<=14

For a complete list of available fields and their types, see the Signals Reference.

TypeExamplesOperators
stringpermutation, http_banner:, *:*, :*, *:, _exists_:
numberlevenshtein_distance, geolocation.asn.number:>, :>=, :<, :<=, :=
booleanorigin_x509.is_ca, certificate_transparency.is_precert:true, :false
dateregistration_metadata.registration_date:>, :<, .days_since:, .days_until:
array<string>dns_ns, technologies:@*, :@@*, .len:
array<inet>dns_a, dns_aaaa:#, :#cidr, :@*

Field not found

unknown_field:value
# Error: Field 'unknown_field' does not exist

Invalid operator for type

permutation:>value
# Error: String fields don't support '>' operator

Missing required field

# Rule references field that doesn't exist in event
# Error: Field 'dns_a' not found in event

Rules are validated before execution to catch errors early:

  • Field names must exist in the schema
  • Operators must be appropriate for the field type
  • Syntax must be valid