Upsilon query syntax
Upsilon is the query language used in the Have I Been Squatted Rules Engine. It expresses detection rules that match domain permutations using signals and patterns.
Query Syntax Overview
Section titled “Query Syntax Overview”Upsilon uses a Lucene-like query syntax that’s intuitive and easy to read. Queries are composed of field comparisons combined with logical operators.
Basic Structure
Section titled “Basic Structure”field:valuefield:>valuefield:*pattern*Escaping and quoting
Section titled “Escaping and quoting”Upsilon tokenizes unquoted values on whitespace. When a value contains spaces, use one of two forms depending on the operator:
- quoted strings (exact match + list membership):
title:"Google Maps"technologies:["Google Analytics" "Google Tag Manager" "Nginx"]- escape spaces for wildcard / array operators (
*...*,@...,@*...*,@@...,@@*...*):
technologies:@*Google\ Maps*http_banner:*nginx\ 1.18*You can also escape delimiter characters in unquoted tokens using a backslash (e.g. \(, \), \[, \], \,, \\).
Logical Operators
Section titled “Logical Operators”Combine conditions using AND, OR, and NOT:
field1:value1 AND field2:value2field1:value1 OR field2:value2NOT field:value(field1:value1 OR field2:value2) AND field3:value3Very long chains of AND/OR can exceed the parser nesting limit (on the order of one thousand binary operators). Prefer shorter clauses and explicit parentheses.
Operators by Data Type
Section titled “Operators by Data Type”String Fields
Section titled “String Fields”String fields support various matching patterns:
Exact Match
Section titled “Exact Match”permutation:amazon.comkind:typosquattingWildcard Matching
Section titled “Wildcard Matching”Contains - Match strings containing a substring:
permutation:*amazon*http_banner:*nginx*Starts With - Match strings beginning with a prefix:
permutation:amazon*dns_ns:ns1*Ends With - Match strings ending with a suffix:
permutation:*.comdns_mx:*mail.protection.outlook.comExistence Check
Section titled “Existence Check”Check if a field exists (has any value):
_exists_:whois_exists_:origin_x509.subject_dnShorthand for the same existence check on a normal field:
dns_a:*Check if a field does not exist:
NOT _exists_:dns_aInequality and slash form
Section titled “Inequality and slash form”Not equal (same shape as other comparisons):
kind:!=typosquattingSlash-delimited pattern (parsed separately from wildcards; the rule matcher currently treats the inner text as a substring match, not full regular expression semantics):
http_banner:/nginx/Number Fields
Section titled “Number Fields”Numeric fields (scores, edit distance, autonomous system numbers in geolocation.asn.number, and similar) support comparison operators:
Comparison Operators
Section titled “Comparison Operators”levenshtein_distance:>5 # greater thanlevenshtein_distance:>=5 # greater than or equallevenshtein_distance:<10 # less thanlevenshtein_distance:<=10 # less than or equallevenshtein_distance:=7 # equal toPractical Examples
Section titled “Practical Examples”# High phishing scoreclassification.phishing:>0.8
# ASN in specific rangegeolocation.asn.number:>=10000 AND geolocation.asn.number:<=20000
# Low edit distancelevenshtein_distance:<3Boolean Fields
Section titled “Boolean Fields”Boolean fields can be matched directly:
origin_x509.is_ca:trueorigin_x509.is_ca:falsecertificate_transparency.is_precert:trueDate Fields
Section titled “Date Fields”Date fields support comparison operators and special time-based functions:
Date Comparisons
Section titled “Date Comparisons”# RFC3339 format datesregistration_metadata.registration_date:>"2024-01-01T00:00:00Z"origin_x509.not_after:<"2025-12-31T23:59:59Z"Days Since/Until
Section titled “Days Since/Until”Calculate days from current time:
# Registered within last 30 daysregistration_metadata.registration_date.days_since:<=30
# Expires in more than 90 daysregistration_metadata.expiration_date.days_until:>90
# Certificate valid for at least 365 daysorigin_x509.not_after.days_until:>=365
# Certificate issued within last 7 daysorigin_x509.not_before.days_since:<=7Suffix comparisons require a numeric right-hand side (days). An explicit operator may be omitted on suffix forms to mean equality (for example field.days_since:30).
Array Fields
Section titled “Array Fields”Array fields represent lists of values (e.g., DNS records, technologies).
Contains (Any Element Matches)
Section titled “Contains (Any Element Matches)”Use @ prefix for “any element contains”:
# Any DNS A record contains "192.168"dns_a:@*192.168*
# Any technology contains "Google Maps"technologies:@*Google\ Maps*
# Any technology starts with "Apache"technologies:@Apache*
# Any nameserver ends with "cloudflare.com"dns_ns:@*cloudflare.comThe same meaning without explicit wildcards in the token (still substring match per element):
technologies:@Googledns_a:@192.168All Elements Match
Section titled “All Elements Match”Use @@ prefix to require all elements match:
# All nameservers contain "cloudflare"dns_ns:@@*cloudflare*
# All MX records end with ".google.com"dns_mx:@@*.google.comWithout explicit wildcards:
dns_ns:@@cloudflareArray Aggregations
Section titled “Array Aggregations”Query array properties:
# Array has more than 5 elementstechnologies.len:>5
# Check minimum/maximum values in numeric arraysdns_a.len:>=2scores.min:>=0.1scores.max:<=0.9There is no .avg: suffix in the Lucene parser; do not use field.avg: in Upsilon.
IP Address Fields (inet)
Section titled “IP Address Fields (inet)”IP address fields support special operators for network matching:
Exact IP Match
Section titled “Exact IP Match”Use # prefix for exact IP address matching:
# Match exact IPv4 addressdns_a:#192.168.1.1
# Match exact IPv6 addressdns_aaaa:#2001:db8::1CIDR Range Matching
Section titled “CIDR Range Matching”Use # prefix with CIDR notation to match IP addresses within a subnet:
# Match any IP in the 192.168.0.0/16 subnetdns_a:#192.168.0.0/16
# Match IPv6 subnetdns_aaaa:#2001:db8::/32
# Match private IP rangesdns_a:#10.0.0.0/8 OR dns_a:#172.16.0.0/12 OR dns_a:#192.168.0.0/16Combining IP Queries
Section titled “Combining IP Queries”# Domain resolves to AWS IP spacedns_a:#54.0.0.0/8 OR dns_a:#52.0.0.0/8
# Certificate SAN includes localhostorigin_x509.san_ip:#127.0.0.1List Membership
Section titled “List Membership”Check if a field value is in a list of values:
# ASN is one of several suspicious providersgeolocation.asn.number:[12345 67890 13335]
# Country is in high-risk regionsgeolocation.country:[CN RU KP]
# Status is active or pendingregistration_metadata.status_codes:[clientTransferProhibited serverTransferProhibited]For array-typed fields, list membership is evaluated as “any element equals any listed value.”
Field references
Section titled “Field references”Compare against another field’s runtime value using a $ prefix on the right-hand side (both names must exist in the schema):
some_numeric_field:>$other_numeric_fieldWeb Enrichment Queries
Section titled “Web Enrichment Queries”Use crawl-derived fields to detect for-sale pages, shell sites, and reused visual assets:
# For-sale parking page with marketplace brandingsitemap.title:*for\ sale* OR business_intel.company_names:Spaceship.com
# Social-link shell sitesitemap.external_link.target_host:[facebook.com instagram.com twitter.com tiktok.com]
# Reused favicon fingerprintfavicon.sha256:18617a981991607982022c0a36a9c958935e4e614e5343f9f1b528844d939aed
# Placeholder-heavy landing pagessitemap.broken_link.kind:placeholder_anchor AND sitemap.broken_link_count:>0These selector-backed fields behave like regular string and number fields even though the raw export stores them inside nested objects and arrays:
# Any crawled page title matchessitemap.title:*login*
# Any extracted company name matchesbusiness_intel.company_names:*holdings*
# Any open-port finding matchesports.port:[80 443 8443]Port Scan Queries
Section titled “Port Scan Queries”Use port scan fields for exposed services and infrastructure triage:
# Mail services exposed on the same hostports.port:[25 110 143 587 993]
# More than two distinct services exposedports.unique_port_count:>2
# Scan reached multiple IPsports.unique_address_count:>1Complete Examples
Section titled “Complete Examples”Infrastructure Detection
Section titled “Infrastructure Detection”Detect domains hosted on suspicious infrastructure:
# Suspicious ASN with high phishing scoregeolocation.asn.number:16509 AND classification.phishing:>0.7
# Domains outside expected geographic regionsNOT geolocation.country:[US CA GB FR DE] AND levenshtein_distance:<5
# Hosted on Cloudflare with suspicious technologydns_ns:@*cloudflare* AND technologies:@*phishing*Registration Patterns
Section titled “Registration Patterns”Detect suspicious domain registration characteristics:
# Recently registered with suspicious registrarregistration_metadata.registration_date.days_since:<=30 ANDregistration_metadata.registrar:*namecheap*
# Domain expires soon (potential abandonment)registration_metadata.expiration_date.days_until:<=90
# Suspicious status codesregistration_metadata.status_codes:*clientHold* ORregistration_metadata.status_codes:*pendingDelete*Certificate Analysis
Section titled “Certificate Analysis”Analyze TLS certificate properties:
# Certificate with many Subject Alternative Namesorigin_x509.san_dns_count:>50
# Self-signed or suspicious issuerorigin_x509.is_ca:true OR origin_x509.issuer_dn:*self-signed*
# Recently issued certificateorigin_x509.not_before.days_since:<=7
# Certificate includes wildcard DNS entryorigin_x509.san_dns:@**Certificate transparency (CT)
Section titled “Certificate transparency (CT)”Query certificate transparency (CT) log data:
# Domain seen frequently in certificate transparency logscertificate_transparency.occurrences_count:>10
# Precertificate entrycertificate_transparency.is_precert:true
# Recently seen in certificate transparency logs (Unix timestamp)certificate_transparency.last_seen_ts:>1704067200DNS Patterns
Section titled “DNS Patterns”Detect suspicious DNS configurations:
# Uses Google Public DNSdns_ns:@*google.com
# No MX records (suspicious for business domains)NOT _exists_:dns_mx AND levenshtein_distance:<3
# Resolves to private IP spacedns_a:#10.0.0.0/8 OR dns_a:#172.16.0.0/12 OR dns_a:#192.168.0.0/16Multi-Signal Detection
Section titled “Multi-Signal Detection”Combine multiple signals for high-confidence detection:
# High-confidence phishing detection( permutation:*login* OR permutation:*account* OR permutation:*verify*) ANDregistration_metadata.registration_date.days_since:<=14 ANDclassification.phishing:>0.8 ANDNOT geolocation.country:[US CA GB]# Suspicious technology stacktechnologies:@*WordPress* ANDtechnologies:@*PHP* ANDorigin_x509.issuer_dn:*Let's\ Encrypt* ANDregistration_metadata.registration_date.days_since:<=7# Parked typo domain with marketplace evidencepermutation:*microsoft* AND( sitemap.title:*for\ sale* OR business_intel.company_names:Spaceship.com OR sitemap.external_link.target_host:www.spaceship.com)# Brand-like shell site with social outbound links and extracted business detailspermutation:*microboft* ANDsitemap.external_link.target_host:[facebook.com instagram.com] ANDbusiness_intel.company_name_count:>0# External email discovery observed through Microsoftmetadata.origin:external_email ANDmetadata.sources.provider:microsoft ANDmetadata.sources.kind:external_emailOperator Precedence
Section titled “Operator Precedence”Operators are evaluated in the following order (highest to lowest):
- Parentheses
() - NOT
- AND
- OR
Always use parentheses to make complex queries explicit:
# Explicit precedence (recommended)(field1:value1 OR field2:value2) AND field3:value3
# Implicit precedence (same as above, but less clear)field1:value1 OR field2:value2 AND field3:value3Best Practices
Section titled “Best Practices”1. Be Specific
Section titled “1. Be Specific”Use precise field comparisons rather than broad wildcards:
# Good - specific patternpermutation:*-login.com
# Less optimal - too broadpermutation:*login*2. Combine Multiple Signals
Section titled “2. Combine Multiple Signals”Create robust rules by combining multiple indicators:
# Strong signal combinationregistration_metadata.registration_date.days_since:<=30 ANDclassification.phishing:>0.7 ANDNOT geolocation.asn.number:[15169 13335 16509]3. Use Appropriate Operators
Section titled “3. Use Appropriate Operators”Choose the right operator for the field type:
# Good - uses CIDR for IP rangesdns_a:#192.168.0.0/16
# Less optimal - uses wildcards for IPsdns_a:*192.168*4. Test rules
Section titled “4. Test rules”Start with conservative conditions and refine based on results:
# Start conservativeclassification.phishing:>0.9 AND registration_metadata.registration_date.days_since:<=7
# Refine based on false positives/negativesclassification.phishing:>0.8 AND registration_metadata.registration_date.days_since:<=14Field Reference
Section titled “Field Reference”For a complete list of available fields and their types, see the Signals Reference.
Field Type Quick Reference
Section titled “Field Type Quick Reference”| Type | Examples | Operators |
|---|---|---|
string | permutation, http_banner | :, !=, *:*, :*, *:, _exists_:, field:*, /.../ (substring in matcher) |
number | levenshtein_distance, geolocation.asn.number | :>, :>=, :<, :<=, :=, != |
boolean | origin_x509.is_ca, certificate_transparency.is_precert | :true, :false |
date | registration_metadata.registration_date | :>, :<, .days_since:, .days_until: |
array<string> | dns_ns, technologies | :@, :@@, wildcards, .len:, .min:, .max: |
array<inet> | dns_a, dns_aaaa | :#, :#cidr, :@ (text-style containment on serialized values) |
| any | (n/a) | RHS $otherField |
Error Handling
Section titled “Error Handling”Common Errors
Section titled “Common Errors”Field not found
unknown_field:value# Error: Field 'unknown_field' does not existInvalid operator for type
permutation:>value# Error: String fields don't support '>' operatorMissing required field
# Rule references field that doesn't exist in event# Error: Field 'dns_a' not found in eventValidation
Section titled “Validation”Rules are validated before execution to catch errors early:
- Field names must exist in the schema
- Operators must be appropriate for the field type
- Syntax must be valid
Additional Resources
Section titled “Additional Resources”- Rules Engine Guide - How to create and manage rules
- Signals Reference - Complete field documentation
- API Documentation - Programmatic access to rules