Data dictionary

Introduction

The table below describes all the fields included in the OpenINTEL Avro file format. Note that our Avro schema has evolved over time, so not all fields will be present in every file.

It is also important to note that OpenINTEL uses sparse storage of record data. Each row contains the same fields, but fields are only set to a value if that value is meaningful in the context of the DNS record type for that result row. All other fields are set to NULL.

Field Datatype Description
query_type STRING original query type sent by the worker (‘A’, ‘AAAA’, …)
query_name STRING original query name sent by the worker
response_type STRING response type received by the worker (‘A’, ‘AAAA’, …); note that this may differ from the query_type, e.g. if an A query returns a CNAME
response_name STRING response name received by the worker; this is not canonicalized (i.e. if the response received contains capital letters, this is copied verbatim); note also that this may differ from the original query_name value, in case of, e.g. a CNAME
response_ttl INTEGER the DNS time-to-live for the record as observed by the OpenINTEL worker; note that this may not be 100% accurate as each worker sits behind its own caching resolver
rtt DOUBLE query round-trip time in (fraction of) seconds
timestamp LONG epoch timestamp in milliseconds (this makes the timestamp usable in MapReduce jobs); precision is at the seconds level. This is the timestamp as recorded by the worker
worker_id INTEGER numeric value indicating the OpenINTEL worker node that performed the measurement
status_code INTEGER RCODE from the response; 65535 is a special value and means that the worker was unable to complete the query due to a timeout
ad_flag INTEGER Boolean value indicating if the response was DNSSEC valid
ip4_address STRING IPv4 address associated with an A record
ip6_address STRING IPv6 address associated with a AAAA record
country STRING two letter country code for Geo IP lookup of IPv4/IPv6 address; source is IP2Location free dataset
as STRING Autonomous System number linked to IPv4/IPv6 address; source is CAIDA’s Prefix-to-AS dataset
as_full STRING Full set of AS numbers linked to IPv4/IPv6 address; contains additional data in case of multi-homed prefixes (note: new format is a JSON object)
ip_prefix STRING Closest enclosing announced prefix containing the address; source is CAIDA’s Prefix-to-AS dataset
cname_name STRING the name from the RDATA field of a CNAME
dname_name STRING the name from the RDATA field of a DNAME
mx_address STRING the name from the RDATA field of an MX record
mx_preference INTEGER the preference value from the RDATA field of an MX record
mxset_hash_algorithm STRING the hash algorithm usedd to hash the MX RRset
mxset_hash STRING hash over all MX records in the MX RRset received in response to an MX query; records are ordered canonically and then hashed. This field can be used as a quick-and-dirty method to group domains with identical MX RRsets, but note that input names are not converted to lower case before hashing. Preference values are not included in the hashed data, only names.
ns_address STRING the name from the RDATA field of an NS record
nsset_hash_algorithm STRING the hash algorithm usedd to hash the NS RRset
nsset_hash STRING hash over all NS records in the NS RRset received in response to an NS query; records are ordered canonically and then hashed. This field can be used as a quick-and-dirty method to group domains with identical NS RRsets, but note that input names are not converted to lower case before hashing.
txt_text STRING concatenation of all RDATA fields in a TXT record. Individual values are enclosed in quotes (“)
txt_hash_algorithm STRING the hash algorithm usedd to hash the TXT RRset
txt_hash STRING hash over all TXT records in the TXT RRset received in response to an NS query; records are ordered canonically and then hashed. This field can be used as a quick-and-dirty method to group domains with identical TXT RRsets.
ds_key_tag INTEGER key tag value from DS RDATA
ds_algorithm INTEGER algorithm value from DS RDATA
ds_digest_type INTEGER hash algorithm ID from DS RDATA
ds_digest STRING hash value from DS RDATA
cds_key_tag INTEGER key tag value from DS RDATA
cds_algorithm INTEGER algorithm value from DS RDATA
cds_digest_type INTEGER hash algorithm ID from DS RDATA
cds_digest STRING hash value from DS RDATA
dnskey_flags INTEGER flags field from DNSKEY RDATA
dnskey_protocol INTEGER protocol field from DNSKEY RDATA
dnskey_algorithm INTEGER algorithm field from DNSKEY RDATA
dnskey_pk_rsa_n STRING hexadecimal value representing the RSA modulus from a DNSKEY record containing an RSA key (this value is often referred to as ‘n’ in cryptography textbooks)
dnskey_pk_rsa_e STRING hexadecimal value representing the RSA public exponent from a DNSKEY record containing an RSA key (this value is often referred to as ‘e’ in cryptography textbooks)
dnskey_pk_rsa_bitsize INTEGER key length in case the DNSKEY record contains an RSA public key
dnskey_pk_eccgost_x STRING hexadecimal value representing part ‘X’ of an ECDSA or GOST public key in case the DNSKEY record is for algorithms 12, 13 or 14
dnskey_pk_eccgost_y STRING hexadecimal value representing part ‘Y’ of an ECDSA or GOST public key in case the DNSKEY record is for algorithms 12, 13 or 14
dnskey_pk_dsa_t STRING hexadecimal value representing part ’T’ of a “classic” DSA key
dnskey_pk_dsa_q STRING hexadecimal value representing part ‘Q’ of a “classic” DSA key
dnskey_pk_dsa_p STRING hexadecimal value representing part ‘P’ of a “classic” DSA key
dnskey_pk_dsa_g STRING hexadecimal value representing part ‘G’ of a “classic” DSA key
dnskey_pk_dsa_y STRING hexadecimal value representing part ‘Y’ of a “classic” DSA key
dnskey_pk_eddsa_a STRING hexadecimal value representing the public key of an EdDSA key-pair (algorithm 15, 16)
dnskey_pk_wire STRING hexadecimal value of the public key RDATA field for algorithms that the worker does not support yet
cdnskey_flags INTEGER flags field from DNSKEY RDATA
cdnskey_protocol INTEGER protocol field from DNSKEY RDATA
cdnskey_algorithm INTEGER algorithm field from DNSKEY RDATA
cdnskey_pk_rsa_n STRING hexadecimal value representing the RSA modulus from a DNSKEY record containing an RSA key (this value is often referred to as ‘n’ in cryptography textbooks)
cdnskey_pk_rsa_e STRING hexadecimal value representing the RSA public exponent from a DNSKEY record containing an RSA key (this value is often referred to as ‘e’ in cryptography textbooks)
cdnskey_pk_rsa_bitsize INTEGER key length in case the DNSKEY record contains an RSA public key
cdnskey_pk_eccgost_x STRING hexadecimal value representing part ‘X’ of an ECDSA or GOST public key in case the DNSKEY record is for algorithms 12, 13 or 14
cdnskey_pk_eccgost_y STRING hexadecimal value representing part ‘Y’ of an ECDSA or GOST public key in case the DNSKEY record is for algorithms 12, 13 or 14
cdnskey_pk_dsa_t STRING hexadecimal value representing part ’T’ of a “classic” DSA key
cdnskey_pk_dsa_q STRING hexadecimal value representing part ‘Q’ of a “classic” DSA key
cdnskey_pk_dsa_p STRING hexadecimal value representing part ‘P’ of a “classic” DSA key
cdnskey_pk_dsa_g STRING hexadecimal value representing part ‘G’ of a “classic” DSA key
cdnskey_pk_dsa_y STRING hexadecimal value representing part ‘Y’ of a “classic” DSA key
cdnskey_pk_eddsa_a STRING hexadecimal value representing the public key of an EdDSA key-pair (algorithm 15, 16)
cdnskey_pk_wire STRING hexadecimal value of the public key RDATA field for algorithms that the worker does not support yet
nsec_next_domain_name STRING next name field from the RDATA of an NSEC record
nsec_owner_rrset_types STRING existing types for the next name from the RDATA of an NSEC record
nsec3_hash_algorithm INTEGER hash algorithm used for an NSEC3 RR
nsec3_flags INTEGER flags field from NSEC3 RDATA
nsec3_iterations INTEGER number of iterations from NSEC3 RDATA
nsec3_salt STRING hexadecimal representation of the salt used as input to the NSEC3 hashing operation
nsec3_next_domain_name_hash STRING hexadecimal representation of the hash next domain name from NSEC3 RDATA
nsec3_owner_rrset_types STRING existing types for the hashed next name from the RDATA of an NSEC3 record
nsec3param_hash_algorithm INTEGER hash algorithm field from NSEC3PARAM RDATA
nsec3param_flags INTEGER flags field from NSEC3PARAM RDATA
nsec3param_iterations INTEGER numer of iterations from NSEC3PARAM RDATA
nsec3param_salt STRING hexadecimal representation of the salt used as input to the NSEC3 hashing operations
spf_text STRING the SPF record data from an SPF RDATA field; this type is deprecated and no longer measured by our current workers (since 5/5/2017)
spf_hash_algorithm STRING hash algorithm used to hash the SPF RRset
spf_hash STRING hash over all SPF RRs in the set returned to the query (see txt_hash, nsset_hash, mxset_hash)
soa_mname STRING MNAME (master name server) field from SOA RDATA
soa_rname STRING RNAME (responsible person) field from SOA RDATA
soa_serial LONG serial value from SOA RDATA
soa_refresh LONG refresh value (in seconds) from SOA RDATA
soa_retry LONG retry value (in seconds) from SOA RDATA
soa_expire LONG expire value (in seconds) from SOA RDATA
soa_minimum LONG minimum value (in seconds) from SOA RDATA
rrsig_type_covered STRING query type covered by this RRSIG
rrsig_algorithm INTEGER algorithm value from RRSIG RDATA
rrsig_labels INTEGER number of labels value from RRSIG RDATA
rrsig_original_ttl LONG original TTL value from RRSIG RDATA
rrsig_signature_inception LONG signature inception timestamp (epoch) from RRSIG RDATA
rrsig_signature_expiration LONG signature expiration timestamp (epoch) from RRSIG DATA
rrsig_key_tag INTEGER key tag value from RRSIG RDATA
rrsig_signer_name STRING signer name field from RRSIG RDATA
rrsig_signature STRING hexadecimal representation of the signature field in RRSIG RDATA
caa_flags INTEGER flags field from CAA RDATA
caa_tag STRING tag field from CAA RDATA
caa_value STRING value from from CAA RDATA
tlsa_usage INTEGER usage field from TLSA RDATA
tlsa_selector INTEGER selector field from TLSA RDATA
tlsa_matchtype INTEGER matching type field from TLSA RDATA
tlsa_certdata STRING hexadecimal representation of certificate data field from TLSA RDATA
ptr_name STRING the name field from PTR RDATA; you will not encounter this in the open datasets, it is used for some separate, specific measurements