Upsilon Query Syntax
Upsilon is the query language used in Have I Been Squatted’s Rules Engine. It provides a powerful, expressive syntax for creating detection rules that match domain permutations based on various signals and patterns.
Query Syntax Overview
Section titled “Query Syntax Overview”Upsilon uses a Lucene-like query syntax that’s intuitive and easy to read. Queries are composed of field comparisons combined with logical operators.
Basic Structure
Section titled “Basic Structure”field:valuefield:>valuefield:*pattern*Escaping and quoting
Section titled “Escaping and quoting”Upsilon tokenizes unquoted values on whitespace. If your value contains spaces, you have two options depending on the operator:
- quoted strings (exact match + list membership):
title:"Google Maps"technologies:["Google Analytics" "Google Tag Manager" "Nginx"]- escape spaces for wildcard / array operators (
*...*,@...,@*...*,@@...,@@*...*):
technologies:@*Google\ Maps*http_banner:*nginx\ 1.18*You can also escape delimiter characters in unquoted tokens using a backslash (e.g. \(, \), \[, \], \,, \\).
Logical Operators
Section titled “Logical Operators”Combine conditions using AND, OR, and NOT:
field1:value1 AND field2:value2field1:value1 OR field2:value2NOT field:value(field1:value1 OR field2:value2) AND field3:value3Operators by Data Type
Section titled “Operators by Data Type”String Fields
Section titled “String Fields”String fields support various matching patterns:
Exact Match
Section titled “Exact Match”permutation:amazon.comkind:typosquattingWildcard Matching
Section titled “Wildcard Matching”Contains - Match strings containing a substring:
permutation:*amazon*http_banner:*nginx*Starts With - Match strings beginning with a prefix:
permutation:amazon*dns_ns:ns1*Ends With - Match strings ending with a suffix:
permutation:*.comdns_mx:*mail.protection.outlook.comExistence Check
Section titled “Existence Check”Check if a field exists (has any value):
_exists_:whois_exists_:origin_x509.subjectCheck if a field does not exist:
NOT _exists_:dns_aNumber Fields
Section titled “Number Fields”Number fields support comparison operators:
Comparison Operators
Section titled “Comparison Operators”levenshtein_distance:>5 # greater thanlevenshtein_distance:>=5 # greater than or equallevenshtein_distance:<10 # less thanlevenshtein_distance:<=10 # less than or equallevenshtein_distance:=7 # equal toPractical Examples
Section titled “Practical Examples”# High phishing scoreclassification.phishing:>0.8
# ASN in specific rangegeolocation.asn.number:>=10000 AND geolocation.asn.number:<=20000
# Low edit distancelevenshtein_distance:<3Boolean Fields
Section titled “Boolean Fields”Boolean fields can be matched directly:
origin_x509.is_ca:trueorigin_x509.is_ca:falsecertificate_transparency.is_precert:trueDate Fields
Section titled “Date Fields”Date fields support comparison operators and special time-based functions:
Date Comparisons
Section titled “Date Comparisons”# RFC3339 format datesregistration_metadata.registration_date:>"2024-01-01T00:00:00Z"origin_x509.not_after:<"2025-12-31T23:59:59Z"Days Since/Until
Section titled “Days Since/Until”Calculate days from current time:
# Registered within last 30 daysregistration_metadata.registration_date.days_since:<=30
# Expires in more than 90 daysregistration_metadata.expiration_date.days_until:>90
# Certificate valid for at least 365 daysorigin_x509.not_after.days_until:>=365
# Certificate issued within last 7 daysorigin_x509.not_before.days_since:<=7Array Fields
Section titled “Array Fields”Array fields represent lists of values (e.g., DNS records, technologies).
Contains (Any Element Matches)
Section titled “Contains (Any Element Matches)”Use @ prefix for “any element contains”:
# Any DNS A record contains "192.168"dns_a:@*192.168*
# Any technology contains "Google Maps"technologies:@*Google\ Maps*
# Any technology starts with "Apache"technologies:@Apache*
# Any nameserver ends with "cloudflare.com"dns_ns:@*cloudflare.comAll Elements Match
Section titled “All Elements Match”Use @@ prefix to require all elements match:
# All nameservers contain "cloudflare"dns_ns:@@*cloudflare*
# All MX records end with ".google.com"dns_mx:@@*.google.comArray Aggregations
Section titled “Array Aggregations”Query array properties:
# Array has more than 5 elementstechnologies.len:>5
# Check minimum/maximum values in numeric arraysdns_a.len:>=2IP Address Fields (inet)
Section titled “IP Address Fields (inet)”IP address fields support special operators for network matching:
Exact IP Match
Section titled “Exact IP Match”Use # prefix for exact IP address matching:
# Match exact IPv4 addressdns_a:#192.168.1.1
# Match exact IPv6 addressdns_aaaa:#2001:db8::1CIDR Range Matching
Section titled “CIDR Range Matching”Use # prefix with CIDR notation to match IP addresses within a subnet:
# Match any IP in the 192.168.0.0/16 subnetdns_a:#192.168.0.0/16
# Match IPv6 subnetdns_aaaa:#2001:db8::/32
# Match private IP rangesdns_a:#10.0.0.0/8 OR dns_a:#172.16.0.0/12 OR dns_a:#192.168.0.0/16Combining IP Queries
Section titled “Combining IP Queries”# Domain resolves to AWS IP spacedns_a:#54.0.0.0/8 OR dns_a:#52.0.0.0/8
# Certificate SAN includes localhostorigin_x509.san_ip:#127.0.0.1List Membership
Section titled “List Membership”Check if a field value is in a list of values:
# ASN is one of several suspicious providersgeolocation.asn.number:[12345 67890 13335]
# Country is in high-risk regionsgeolocation.country:[CN RU KP]
# Status is active or pendingregistration_metadata.status_code:[clientTransferProhibited serverTransferProhibited]Complete Examples
Section titled “Complete Examples”Infrastructure Detection
Section titled “Infrastructure Detection”Detect domains hosted on suspicious infrastructure:
# Suspicious ASN with high phishing scoregeolocation.asn.number:16509 AND classification.phishing:>0.7
# Domains outside expected geographic regionsNOT geolocation.country:[US CA GB FR DE] AND levenshtein_distance:<5
# Hosted on Cloudflare with suspicious technologydns_ns:@*cloudflare* AND technologies:@*phishing*Registration Patterns
Section titled “Registration Patterns”Detect suspicious domain registration characteristics:
# Recently registered with suspicious registrarregistration_metadata.registration_date.days_since:<=30 ANDregistration_metadata.registrar:*namecheap*
# Domain expires soon (potential abandonment)registration_metadata.expiration_date.days_until:<=90
# Suspicious status codesregistration_metadata.status_code:*clientHold* ORregistration_metadata.status_code:*pendingDelete*Certificate Analysis
Section titled “Certificate Analysis”Analyze TLS certificate properties:
# Certificate with many Subject Alternative Namesorigin_x509.san_dns_count:>50
# Self-signed or suspicious issuerorigin_x509.is_ca:true OR origin_x509.issuer:*self-signed*
# Recently issued certificateorigin_x509.not_before.days_since:<=7
# Certificate includes wildcard DNS entryorigin_x509.san_dns:@**Certificate Transparency
Section titled “Certificate Transparency”Query Certificate Transparency log data:
# Domain seen frequently in CT logscertificate_transparency.occurrences_count:>10
# Precertificate entrycertificate_transparency.is_precert:true
# Recently seen in CT logs (Unix timestamp)certificate_transparency.last_seen_ts:>1704067200DNS Patterns
Section titled “DNS Patterns”Detect suspicious DNS configurations:
# Uses Google Public DNSdns_ns:@*google.com
# No MX records (suspicious for business domains)NOT _exists_:dns_mx AND levenshtein_distance:<3
# Resolves to private IP spacedns_a:#10.0.0.0/8 OR dns_a:#172.16.0.0/12 OR dns_a:#192.168.0.0/16Multi-Signal Detection
Section titled “Multi-Signal Detection”Combine multiple signals for high-confidence detection:
# High-confidence phishing detection( permutation:*login* OR permutation:*account* OR permutation:*verify*) ANDregistration_metadata.registration_date.days_since:<=14 ANDclassification.phishing:>0.8 ANDNOT geolocation.country:[US CA GB]# Suspicious technology stacktechnologies:@*WordPress* ANDtechnologies:@*PHP* ANDorigin_x509.issuer:*Let's Encrypt* ANDregistration_metadata.registration_date.days_since:<=7Operator Precedence
Section titled “Operator Precedence”Operators are evaluated in the following order (highest to lowest):
- Parentheses
() - NOT
- AND
- OR
Always use parentheses to make complex queries explicit:
# Explicit precedence (recommended)(field1:value1 OR field2:value2) AND field3:value3
# Implicit precedence (same as above, but less clear)field1:value1 OR field2:value2 AND field3:value3Best Practices
Section titled “Best Practices”1. Be Specific
Section titled “1. Be Specific”Use precise field comparisons rather than broad wildcards:
# Good - specific patternpermutation:*-login.com
# Less optimal - too broadpermutation:*login*2. Combine Multiple Signals
Section titled “2. Combine Multiple Signals”Create robust rules by combining multiple indicators:
# Strong signal combinationregistration_metadata.registration_date.days_since:<=30 ANDclassification.phishing:>0.7 ANDNOT geolocation.asn.number:[15169 13335 16509]3. Use Appropriate Operators
Section titled “3. Use Appropriate Operators”Choose the right operator for your data type:
# Good - uses CIDR for IP rangesdns_a:#192.168.0.0/16
# Less optimal - uses wildcards for IPsdns_a:*192.168*4. Test Your Rules
Section titled “4. Test Your Rules”Start with conservative conditions and refine based on results:
# Start conservativeclassification.phishing:>0.9 AND registration_metadata.registration_date.days_since:<=7
# Refine based on false positives/negativesclassification.phishing:>0.8 AND registration_metadata.registration_date.days_since:<=14Field Reference
Section titled “Field Reference”For a complete list of available fields and their types, see the Signals Reference.
Field Type Quick Reference
Section titled “Field Type Quick Reference”| Type | Examples | Operators |
|---|---|---|
string | permutation, http_banner | :, *:*, :*, *:, _exists_: |
number | levenshtein_distance, geolocation.asn.number | :>, :>=, :<, :<=, := |
boolean | origin_x509.is_ca, certificate_transparency.is_precert | :true, :false |
date | registration_metadata.registration_date | :>, :<, .days_since:, .days_until: |
array<string> | dns_ns, technologies | :@*, :@@*, .len: |
array<inet> | dns_a, dns_aaaa | :#, :#cidr, :@* |
Error Handling
Section titled “Error Handling”Common Errors
Section titled “Common Errors”Field not found
unknown_field:value# Error: Field 'unknown_field' does not existInvalid operator for type
permutation:>value# Error: String fields don't support '>' operatorMissing required field
# Rule references field that doesn't exist in event# Error: Field 'dns_a' not found in eventValidation
Section titled “Validation”Rules are validated before execution to catch errors early:
- Field names must exist in the schema
- Operators must be appropriate for the field type
- Syntax must be valid
Additional Resources
Section titled “Additional Resources”- Rules Engine Guide - How to create and manage rules
- Signals Reference - Complete field documentation
- API Documentation - Programmatic access to rules