GenAI Events - User Guide

Client Support

Technical
[email protected]

About Gener8

Since its launch in 2018, Gener8 has been at the forefront of the “open data” movement: the belief that people should be able to control and be rewarded from their data. Gener8 makes it easy for its users to anonymously share their digital data, in exchange for rewards through its mobile and desktop apps.

This clear and transparent value exchange means that Gener8 has access to a uniquely comprehensive dataset. Gener8 upholds the highest standards of informed consent on every data set a user chooses to share with us.

Dataset

Schema

NameTypeOptionalDescription
idSTRINGNA permanent identifier for the event
user_idSTRINGNPermanent and unique user ID.
user_agentSTRINGYThe user agent of the browser from which the event was received
latitudeFLOATYGeocoded latitude, based on client IP address at the time
longitudeFLOATYGeocoded longitude, based on client IP address at the time
postal_codeSTRINGYGeocoded postal code this event occured in, based on client IP address at the time. Not available for all regions.
citySTRINGYGeocoded city this event occured in, based on client IP address at the time.
regionSTRINGYGeocoded region the event occurred in, based on client IP address at the time.
countrySTRINGYGeocoded two letter country code the event occurred in, based on client IP address at the time.
received_atTIMESTAMPNUTC timestamp of when Gener8 received the pageview
eventSTRINGNThe type of event, one of 'prompt' or 'response'
content_typeSTRINGNThe MIME type of the content
contentSTRINGNThe content of the event
conversation_idSTRINGYThe unique conversation identifier, to group prompts and responses
vendorSTRINGNThe vendor of the product the event was collected from, e.g 'OpenAI' or 'Google'
productSTRINGNThe common product name for the source of the events, e.g. "ChatGPT" or "AI Overview"
modelSTRINGYThe model name/version, as given by the vendor
timestampTIMESTAMPYThe UTC timestamp of when the pageview occurred, according to the user's device
timezoneSTRINGYThe timezone the event was made from, according to the user's device
sequenceINTYThe order of the message in the conversation, zero indexed
session_idSTRINGYA unique session identifier for the user
package_nameSTRINGYThe app which the event was collected from
package_versionSTRINGYThe app version the event was collected from
sourceSTRINGNThe collection source of the event
sourcesSTRUCTYThis is a repeated field of structs, containing sources from responses
sources.urlSTRINGNThe URL of the source page
sources.titleSTRINGNThe title from the source page
sources.summarySTRINGNA short description of the source page
sources.authorSTRINGYThe author name, if available

Geocoded fields

The Geocoding fields (country, region, city, postal_code, latitude and longitude) are inferred from the IP address of the users through a third-party dataset provider. This dataset takes the latitude and longitude and translates this information into an interpretable location. The dataset is updated twice a week, and it provides approximately 99.8% accuracy at the Country level and 68% at the City level. It is worth noting that if the user uses a VPN system to browse the web, then this information will reflect the VPN location and not its real location.

Session IDs

A new session ID is created after more than 30 minutes since the last event.

Response content types

Some collection sources are able to capture both the plain text response as well as the HTML, making it possible to understand formatting as well as extracting inline links. These responses represented as two rows where all fields are identical, including the id, except for the content_type and content. For example:

ideventcontent_typecontent
123prompttext/plainHello :)
456responsetext/plainHi there! How can I help?
456responsetext/html<p>Hi there! How can I help?</p>

Events without timestamps

Due to our collection methodology, we are able to collect historical events from some sources, from some vendors, however we are not able to identify when the event occurred. These events will have a null value in the timestamp field. The sequence field can be used to correctly order the messages in a conversation.

Usage with pageviews

It's possible to combine GenAI events with Pageview events into a single feed to analyse online behavior surrounding a conversation, such as clicking on links, or making web searches. There are a number of viable approaches, this simple example creates a combined dataset using a UNION in SQL:

with genai_events as (
  select
    id,
    event,
    timestamp,
    timezone,
    content,
    '' as title,
    vendor,
    user_id
  from gener8_genai_events
  where date(received_at) >= current_date
  and date(timestamp) >= current_date
  and content_type = 'text/plain'
),

pageviews as (
  select
    id,
    'pageview' as event,
    timestamp,
    timezone,
    concat(url_domain, url_path, url_query) as content,
    title,
    'Browser' as vendor,
    user_id
  from gener8_pageviews
  where date(received_at) >= current_date
  and date(timestamp) >= current_date
  and user_id in (select user_id from genai_events)
),

merged_events as (
  select * from pageviews
  union all
  select * from genai_events
)

select *
from merged_events
order by user_id, timestamp