Augmenting Collected Events
Upon entering the CDP, all collected events are transformed in real-time to include additional information or to remove user sensitive information before they are persisted. The following sections described the data we augment events with.
Metadata
The following event metadata is derived and available for all events.
Metadata | Location on the tracking event | Example |
---|---|---|
Timestamp (readable) | meta.timestamp | 2022-07-07T16:09:17.809Z |
Timestamp in milliseconds (epoch unix format) | meta.timestampMillis | 1657210157809 |
Is a valid event | meta.isValidEvent | true |
User ID | meta.userId | {"id": "1pCookie", "id": "c6ac2829223e182cc225b2278a2e2622"} |
Request made by a bot/crawler | meta.fromBot | false |
Request made by device with a blacklisted user-agent | meta.isBlacklisted | false |
IP Address
For requests that are directly sent by the user's device, through one of our SDKs, we use the IP address of the device that executed the HTTP request. For server-to-server events or events that were received through one of the supported webhooks, the IP has to be explicitly stated as part of the request. Some third-party integrations via webhooks might not support sending the IP address. Refer to each webhook details for information on how to send IP addresses.
After extracting the IP address, it is persisted in the event under the field meta.ip
.
IP Anonymization
IP addresses are automatically anonymized by default.
Anonymization occurs as soon as it is possible, before any logging or persistence takes place, and the full IP address
is never stored. The anonymization principle is simple: every IP found on the request will have its last octet set to
0
. For example, if we have 123.22.22.14
it will become 123.22.22.0
.
Geolocation
The IP address from the user tracking event is analyzed so that we can derive geolocation information. The information obtained through this process is purposely imprecise in order to avoid tracing the address or location of a particular user.
Derived Information | Location on the tracking event | Example |
---|---|---|
Country | meta.country | United States of America |
City | meta.city | New York |
Location (latitude and longitude) | meta.location | [40.7128, 74.0060] |
Autonomous System Number (ASN) | meta.asn | COMCAST-7922 |
Postal Code | meta.postalCode | 32073 |
In order to provide this functionality the CDP uses GeoLite2 databases created by MaxMind, available from https://www.maxmind.com.
Entrypoint
The meta.entrypoint
field can be used to identify how the event was collected on the CDP.
Under meta.entrypoint.type
we label the channel through which the event reached the CDP:
tag
- the event was collected directly by either the JavaScript tag or the Android/iOS SDK.server
- the event entered the CDP through the server-to-server endpoint.templated
- the event entered the system through one of the templated webhooks.source
- similar totemplated
but currently it only applies to the integration with Segment.
Entrypoints of type templated
and source
can also contain an additional field, name
, which states the third-party
integration from which the event originated from.
Derive Information
We use the value of the User-Agent HTTP header to assert the user's device. Similar to the IP address, the User-Agent is directly fetched from events sent using one of our SDKs or has to be explicitly sent if the request entered through a server-to-server endpoint or through a third-party webhook.
We store the following information under meta.user-device
:
user-agent-family
user-agent-major
user-agent-minor
os-family
os-major
os-minor
device-family
device-type
Interaction Type
We classify each event type according an interaction type. This information is stored under meta.interactionType
.
The interaction type can either be:
passive
- events with typeactivationRequest
,matchRequest
, andcookieSyncRequest
;outbound
- events with typeadView
,emailDelivery
, andemailSend
;active
- all the other event types.
Product Data
When an event contains one or more products, the received product details will be complemented with information about the product if a Product Feed is available. More information on how to enable and setup this feature can be found in the Offline Imports section.
Currency Conversion
We convert every field that contains a currency to the default system currency. We support a variety of
currencies and the conversion rates used are updated daily. All fields under meta.data
should
have the currency converted to the system default.
We only convert the currency for fields that are part of our events schema. Custom fields are not converted.
Origin
We add some metadata about how the event originated under meta.origin
. This field encompasses various factors that
might have led to the occurrence of the event and it contains the following fields:
source
- Where the user came from. Possible values:direct
- The user accessed the website directly;email
- The event is related to an email event;none
- The event occurred while the user was navigating the website, after it had already entered through some other channel;google
- The user accessed the website through one of Google channels (like Adwords and DoubleClick);- Other - Related to UTM vars or the web page the user was what before being directed to your website;
medium
- The type of traffic or tool used to get to the website;campaign
- Ad campaign that originated this event;keyword
- Any keyword related to a possible ad that originated this event;content
- Used to differentiate the content of a possible ad that originated this event.
These fields can be derived from the event using the following prioritized list:
# | Conditions by priority | Extracted origin |
---|---|---|
1 | Event contains utm_* query parameters (at least utm_source ) | source: value from utm_source medium: value from utm_medium campaign: value from utm_campaign content: value from utm_content keyword: value from utm_term |
2 | Event contains gclid query parameter | source: google medium: cpc adwords |
3 | Event contains gclsrc query parameter | source: google medium: cpc doubleclick |
4 | Event type is email* | source: email campaign: value from campaignId |
5 | Event is missing an referrer | source: direct |
6 | Event referrer is a search engine | source: domain from referrer medium: organic search |
7 | Event referrer is a payment platform | source: none |
8 | Event referrer is blacklisted | source: none |
9 | Any other referrer | source: domain from referrer medium:referal content:path from referrer |