e-Receipts - Product Overview
About
Our proprietary email receipt dataset covers 140+ merchants in retail and e-commerce with a stable 5-year history. The data is obtained with the panel's consent, using an integration with Gmail & Outlook, and offers unique insights into the digital economy.
Our Email Receipt Data product stands out due to its unique data acquisition method. The data is primarily sourced through the active consent of a panel of users who opt to share their e-receipts, by using our B2C apps. This approach ensures the data's authenticity and quality, setting it apart from traditional data collection methods.
This rich dataset caters to various use-cases and verticals within the digital economy. It's particularly valuable for businesses, market researchers, and analysts seeking insights into consumer behavior, market trends, and competitor analysis. The primary use-cases encompass company performance, customer profiling, market research, advertising effectiveness, and product development.
In our broader data offering, this e-receipt product serves as a foundational element. It provides real-world transaction data, which can be seamlessly integrated with other data sources to create a comprehensive, multi-dimensional view of consumer behavior and market dynamics. Whether utilised on its own or in conjunction with our other datasets, e-receipt data plays a pivotal role in helping businesses make data-driven decisions and gain a competitive edge in the digital landscape.
Schema
| Name | Type | Nullable | Description |
|---|---|---|---|
| DATETIME | TIMESTAMP | N | The date and time (YYYY-MM-DD hh:mm:ss) that the users received the email in their inbox in Coordinated Universal Time (UTC). This is a proxy for when the transaction occurred. |
| ENTITY_NAME | STRING | N | The merchant of the transaction (e.g., Amazon). |
| ITEM_TITLE | STRING | Y | The name or description of the line item as displayed on the e-Receipt. If absent, it is represented as an empty string. |
| ITEM_QUANTITY | INTEGER | Y | The quantity of items purchased for the given ITEM_TITLE. |
| ITEM_PRICE | INTEGER | Y | The price of the item in the original currency as displayed on the e-Receipt in cents values. This represents the price per unit. This might or might not include product-level discounts and taxes. |
| ITEM_ORIGINAL_PRICE | INTEGER | Y | The price of the item in the original currency as displayed on the e-Receipt in cents values. This represents the raw price the merchant represented which may be an aggregate or the unit price. This might or might not include product-level discounts and taxes. |
| IS_ITEMISED | BOOLEAN | N | It represents whether the e-Receipt is itemised (e.g., it reports the item in the basket) or not (e.g., it reports only the transaction without item-level information). If false, the ITEM_TITLE will be represented as an empty string, and ITEM_PRICE and ITEM_QUANTITY will be reported as 0. |
| BASKET_VALUE | INTEGER | Y | It is calculated by summing the multiplications of each item's price (ITEM_PRICE) with its respective quantity (ITEM_QUANTITY) as reported on the e-Receipt. This calculation may or may not take into account product-level discounts and taxes. |
| TOTAL_TAX | INTEGER | Y | The amount of tax spent on the transaction as reported on the e-Receipt. If zeros, no tax value is found in the e-Receipt. |
| DISCOUNT_PRICE | INTEGER | Y | The total discount applied to the transaction. If negative, it represents a discount, if positive, it indicates a supercharge (variability imposed by the merchant). If some sort of point can be used to pay for the ITEM, then those will be represented as a discount after applying the conversion from points to currency. |
| SHIPPING_PRICE | INTEGER | Y | The delivery price of the items in the e-Receipt. If zeros, no shipping charge is found in the e-Receipt. |
| GIFT_VOUCHER_PRICE | INTEGER | Y | Whether part of the total amount in the e-Receipt has been paid with a voucher. If zeros, no voucher value is found in the e-Receipt. |
| TRANSACTION_VALUE | INTEGER | N | The total amount spent on the transaction in the local currency. |
| CURRENCY | STRING | Y | The currency that the transaction was made in |
| ORDER_REFERENCE | STRING | Y | The merchant's identifier for the transaction. Also known as order number. |
| TRANSACTION_ID | STRING | N | A permanent and unique ID is assigned by Gener8 to each unique email at the time of storing the purchased item information, which is comprised of 36 alphanumeric characters. |
| ORDER_ID | STRING | N | The order identification of the e-Receipt, composed by the ORDER_REFERENCE and TRANSACTION_ID. Used to identify an order when multiple e-Receipts are observed in one email. |
| ITEM_ID | STRING | Y | A permanent and unique ID assigned by Gener8 at the time of storing the purchased item information comprised ORDER_REFERENCE-integer or TRANSACTION_ID-integer. This is Null when IS_ITEMISED is FALSE. |
| INBOX_ID | STRING | N | A permanent and unique ID is assigned by Gener8 to each unique inbox when the user enrols on any of the Gener8 products. This ID is composed of 36 alphanumeric characters and is linked to the email of the USER_ID. One single USER_ID might have multiple INBOX_ID. |
| USER_ID | STRING | N | A permanent and unique user ID is assigned by Gener8 at the time that the user enrols on any of the Gener8 products. It is comprised of 36 alphanumeric characters. |
| RECEIVED_AT | TIMESTAMP | N | The date and time (YYYY-MM-DD hh:mm:ss.) that Gener8 ingested the e-Receipt in Coordinated Universal Time (UTC). |
| PROCESSED_DATE | TIMESTAMP | N | The date and time (YYYY-MM-DD hh:mm:ss) that the receipt was processed. Note that receipts may be reprocessed multiple times as we improve accuracy and quality. |
N.B. monetary values are always given in their minor currency unit, i.e. 1 USD = 100.
Travel and Bookings
For customers interested in travel and booking-related purchases, we provide additional fields containing specific information relevant to travel and bookings.
Travel and Booking Breakdown
Travel and booking merchants are categorized into three sub-categories:
- Mass Transit: Includes Plane, Train, Ferry, Coaches
- Accommodation: Includes Hotels, B&Bs, Flat/House Rentals
- Rideshare: Includes services like Uber, Lyft, and other ride-hailing platforms
the following additional fields are included, where available for the merchant:
| Name | Type | Optional | Available For | Description |
|---|---|---|---|---|
| TRAVEL_ORIGIN_COUNTRY | STRING | Y | Mass Transit | The country where the journey begins.. |
| TRAVEL_ORIGIN_CITY | STRING | Y | Mass Transit | The city where the journey begins. |
| TRAVEL_START_DATE | TIMESTAMP | Y | All | The precise start date of the journey. |
| TRAVEL_DESTINATION_COUNTRY | STRING | Y | Mass Transit, Accommodation | The country where the journey ends. |
| TRAVEL_DESTINATION_CITY | STRING | Y | Mass Transit, Accommodation | The city where the journey ends. |
| TRAVEL_END_DATE | TIMESTAMP | Y | All | The precise end date of the journey. |
| TRAVEL_CARRIER | STRING | Y | Mass Transit | The carrier associated with the journey leg (e.g., airline or train service). |
| TRAVEL_ORIGIN_NAME | STRING | Y | Mass Transit | The name of the travel origin location (e.g., station or airport). |
| TRAVEL_DESTINATION_NAME | STRING | Y | All | The name of the travel destination location (e.g., station or airport). |
| TRAVEL_JOURNEY_DISTANCE | FLOAT | Y | Rideshare | The distance of the journey in miles. |
| TRAVEL_JOURNEY_TIME | INTEGER | Y | Rideshare | The duration of the journey in seconds. |
| TRAVEL_ACCOMMODATION_NAME | STRING | Y | Accommodation | The name of the booked accommodation. |
| TRAVEL_ACCOMMODATION_ADDRESS | STRING | Y | Accommodation | The address of the booked accommodation. |
| TRAVEL_NUMBER_OF_TRAVELLERS | INTEGER | Y | All | The number of travellers for this journey. |
Delivery
Method
- Amazon S3
- Google Cloud Storage (GCS)
- Azure Blob Storage
Frequency
- Daily
- Weekly
- Monthly
- Quarterly
- On-Demand
Format
Parquet + Gzip
Sample
Available on request