Augmenting Collected Events
Upon entering the CDP, all collected events are transformed in real-time to include additional information or to remove user sensitive information before they are persisted. The following sections described the data we augment events with.
The following event metadata is derived and available for all events.
|Metadata||Location on the tracking event||Example|
|Timestamp in milliseconds (epoch unix format)|
|Is a valid event|
|Request made by a bot/crawler|
|Request made by device with a blacklisted user-agent|
For requests that are directly sent by the user's device, through one of our SDKs, we use the IP address of the device that executed the HTTP request. For server-to-server events or events that were received through one of the supported webhooks, the IP has to be explicitly stated as part of the request. Some third-party integrations via webhooks might not support sending the IP address. Refer to each webhook details for information on how to send IP addresses.
After extracting the IP address, it is persisted in the event under the field
IP addresses are automatically anonymized by default.
Anonymization occurs as soon as it is possible, before any logging or persistence takes place, and the full IP address
is never stored. The anonymization principle is simple: every IP found on the request will have its last octet set to
0. For example, if we have
184.108.40.206 it will become
The IP address from the user tracking event is analyzed so that we can derive geolocation information. The information obtained through this process is purposely imprecise in order to avoid tracing the address or location of a particular user.
|Derived Information||Location on the tracking event||Example|
|Location (latitude and longitude)|
|Autonomous System Number (ASN)|
In order to provide this functionality the CDP uses GeoLite2 databases created by MaxMind, available from https://www.maxmind.com.
meta.entrypoint field can be used to identify how the event was collected on the CDP.
meta.entrypoint.type we label the channel through which the event reached the CDP:
server- the event entered the CDP through the server-to-server endpoint.
templated- the event entered the system through one of the templated webhooks.
source- similar to
templatedbut currently it only applies to the integration with Segment.
Entrypoints of type
source can also contain an additional field,
name, which states the third-party
integration from which the event originated from.
We use the value of the User-Agent HTTP header to assert the user's device. Similar to the IP address, the User-Agent is directly fetched from events sent using one of our SDKs or has to be explicitly sent if the request entered through a server-to-server endpoint or through a third-party webhook.
We store the following information under
We classify each event type according an interaction type. This information is stored under
The interaction type can either be:
passive- events with type
outbound- events with type
active- all the other event types.
When a event contains one or more products, the received product details will be complemented with information about the product that is stored in the CDP, if a product feed is available. More information on how to enable and setup this feature can be found in the Offline Imports section.
We convert every field that contains a currency to the default system currency. We support a variety of
currencies and the conversion rates used are updated daily. All fields under
have the currency converted to the system default.
We only convert the currency for fields that are part of our events schema. Custom fields are not
converted. The unaltered payload with the original currency can be found under
We add some metadata about how the event originated under
meta.origin. This field encompasses various factors that
might have led to the occurrence of the event and it contains the following fields:
source- Where the user came from. Possible values:
direct- The user accessed the website directly;
none- The event occurred while the user was navigating the website, after it had already entered through some other channel;
- Other - Related to UTM vars or the web page the user was what before being directed to your website;
medium- The type of traffic or tool used to get to the website;
campaign- Ad campaign that originated this event;
keyword- Any keyword related to a possible ad that originated this event;
content- Used to differentiate the content of a possible ad that originated this event.
These fields can be derived from the event using the following prioritized list:
|#||Conditions by priority||Extracted origin|
|1||Event contains ||source: value from |
medium: value from
campaign: value from
content: value from
keyword: value from
|2||Event contains ||source: google|
medium: cpc adwords
|3||Event contains ||source: google|
medium: cpc doubleclick
|4||Event type is ||source: email|
campaign: value from
|5||Event is missing an referrer||source: direct|
|6||Event referrer is a search engine||source: domain from |
medium: organic search
|7||Event referrer is a payment platform||source: none|
|8||Event referrer is blacklisted||source: none|
|9||Any other referrer||source: domain from |