Data Warehouse (BigQuery)
- 1 Introduction
- 2 Pricing
- 3 Providing Users Access to the Data Warehouse
- 4 Google Project Location
- 5 Data Security
- 6 Data Sets & Streaming
- 7 Custom Reports
- 8 Google BigQuery tutorials
- 9 FAQ
- 9.1 What is a data warehouse?
- 9.2 What is a BigQuery?
- 9.3 How does this data warehouse work?
- 9.4 What UltraCart plans support it?
- 9.5 How often does data update?
- 9.6 What data sets are supported?
- 9.7 What is the data model?
- 9.8 How does the data model different from the REST API?
- 9.9 What tools can connect to BigQuery?
- 9.10 What query language does BigQuery support?
- 9.11 How does this data format differ from a traditional SQL database?
- 9.12 What use cases can a business solve with a data warehouse?
- 9.13 Are linked accounts supported?
- 9.14 Can I join data within the UltraCart BigQuery project with another BigQuery project?
- 9.15 What other data sets can you put in BigQuery to join up?
- 9.16 What happens as the data model evolves?
- 9.17 What about other data warehouse software other than BigQuery?
- 10 Enhancing Tables with Views
- 11 What 3rd Party Tools Are Merchants Using with BigQuery
- 12 Configuring a User to Run Queries Programmatically
- 13 Sample Queries
- 13.1 Adding Calculations to Auto Orders
- 13.2 Adding EST time zone dates to orders
- 13.3 Active Auto Order Next Rebill
- 13.4 Revenue Per Item Over a Certain Time Period
- 13.5 Query Table Size in GB
- 13.6 Order ID Experiment Variations
- 13.7 Orders For Specific Items Shipped By a User on a Specific Date
- 13.8 Auto Orders with Pre-shipment Notice Date more than one week prior to shipment.
- 13.9 Orders with a Specific Custom Field Value
- 13.10 List All Upsell Paths, Variation and Upsells
- 13.11 Upsell Screenshot List
- 13.12 Auto Orders with Missed Pre-shipment Notices
- 13.13 Auto Order Logs Messages After Pre-shipment Notice and Before Shipment Should Take Place
- 13.14 Customer LTV Modeling
- 13.15 Find Order Id by Transaction Value
- 13.16 Auto Order Weekly Cohort by Main Item Id
- 13.17 Auto Order Future Predictions
- 13.18 Extract a Marketing List from Order History
- 13.19 Auto Order Churn Overall
- 13.20 Auto Order Churn by Item
- 13.21 Affiliate Metrics
- 13.22 Customer Cohort Revenue
- 13.23 Auto Order Cohort Revenue
- 13.24 Auto Order Cohort Active Count
- 13.25 Auto Order Cohort Active Percentage
- 13.26 Order Upsells
- 13.27 Affiliate Click ↔︎ Order Metrics
- 13.28 All UTMs for an Order
- 13.29 Projected Future Revenue Based Upon Initial Item Purchased (Non-Auto Orders)
- 13.30 UTM Sales By Week
- 13.31 UTM Clicks and Weighted Revenue
- 13.32 StoreFront Experiment Statistics
- 13.33 Auto Order Cohort Related Queries
- 13.34 Count of Apple Pay/Google Pay/Microsoft Pay Transactions
- 13.35 New Customers in a Specific Date Range By Email
- 13.36 Conversion Rate from Analytics Sessions
- 13.37 Auto Order Percentages with CC vs PayPal 30 Day Moving Averages
- 13.38 Coupon Usage Summary
- 13.39 Coupon Usage Detail
- 13.40 Order Count and Revenue by Hour of Day
- 13.41 Finding Expensive Queries in a Project
- 13.42 Wholesale Customer Profile LTV
- 13.43 Inventory Value at Start Of Each Month
- 13.44 Auto Order LTV calculation
- 13.45 Upsell Path Statistics
- 13.46 Find Upsell Offers Containing Text
- 13.47 Upsell Statistics by Offer and Path (across all linked accounts)
- 13.48 Loyalty Balance (Points based Program)
- 13.49 Loyalty Balance (Store Credit Based Program)
- 13.50 SKUs Sent to 3PL
- 14 Related Documentation
Introduction
UltraCart’s Data Warehouse (BigQuery) feature allows for a business to quickly setup a data warehouse of their customers, items, orders and auto order information with Google’s BigQuery database service. UltraCart will handle the data transfer into BigQuery on an automated base. Businesses can then use the information to feed support systems, write reports, and join up with other data systems for operational analysis.
Additional Benefits:
easy reporting in Google Data Studio / Microsoft Power BI
connect data to Microsoft Excel or Google Sheets
extract querying using the BigQuery CLI or SDK to extract the data set
Pricing
UltraCart covers the first $1.00 of expense associated with the BigQuery project hosting the account data. That is enough to cover all of the data storage expense and bandwidth associated with the updates that occur over time. The actual cost of BigQuery depends solely on the amount of queries that you execute against the environment. For most merchants they will only spend a few dollars a month on their data warehouse environment. UltraCart will once a month apply the previous month’s data warehouse usage to your bill.
The full pricing for BigQuery is located here. BigQuery is basically:
$0.02/GB of storage with the first 10GB free.
$5.00/TB of data processed during queries.
There are no hourly fees for having a BigQuery data set which makes it incredibly economical for more businesses.
Providing Users Access to the Data Warehouse
ATTENTION:
The OWNER USER is the only user that can grant access to the Data Warehouse (BigQuery) integration.
In order for a user to access your data warehouse, the owner of the UltraCart account must grant the individual user permissions. The email associated with the user must be a Google account email. User management is done under Configuration → Account & Users → Users. The permissions that can be granted to the user are:
We recommend that you grant the minimum level of data access that is absolutely necessary for the individual user.
Google Project Location
For the location of your Data Warehouse (BigQuery) project in the Google Cloud navigate to:
Navigate to Configuration → Developer → Data Warehouse (BigQuery)
This page will provide you with the project URL, data set descriptions, etc.
Data Security
UltraCart utilizes Google Cloud IAM to limit access to the BigQuery project to only those users on the account that have been granted access to it. UltraCart also utilizes BigQuery column level security to restrict access to certain columns within various tables to specific levels of users. Whenever data is restricted, a SHA-256 hash of the value will be available in another column to be used in queries instead of the underlying PII data. Having this fine grained data access allows a business owner to provide limited access to the valuable corporate data without exposing customer lists, etc. We encourage owners to only grant the lowest level of access possible to achieve the necessary business result.
Data Sets & Streaming
Data warehouses rarely want to see data deleted, BigQuery is no exception, so a traditional strategy of inserts/deletes is not applicable with BigQuery. UltraCart streams the changes into BigQuery tables located in the ultracart_dw_streaming data set. Streaming allows for changes to appear within the data warehouse typically within less than 2 minutes.
Merchant’s should never attempt to query against the tables in the streaming data set. Streaming data sets contain one row per mutation of the object and will lead to unexpected results.
The other data sets are actually views that turn the streaming tables into a consistent snapshot of the data and also remove columns that are inaccessible due to data security at the column level.
Data Set | Description | Security Level |
---|---|---|
ultracart_dw | All the tables without any sensitive information. | Level 1 - Standard Access |
ultracart_dw_low | Affiliate information except tax id and software activation codes | Level 2 - Low |
ultracart_dw_medium | PII information for customers | Level 3 - Medium |
ultracart_dw_high | Tax IDs for affiliates and wholesale customers | Level 4 - High |
For merchant using linked account, the parent account will also have a second set of data sets. These are the same security levels, but include information from ALL the linked accounts to make consolidated reporting possible.
Data Set | Description | Security Level |
---|---|---|
ultracart_dw_linked | All the tables without any sensitive information. | Level 1 - Standard Access |
ultracart_dw_linked_low | Affiliate information except tax id and software activation codes | Level 2 - Low |
ultracart_dw_linked_medium | PII information for customers | Level 3 - Medium |
ultracart_dw_linked_high | Tax IDs for affiliates and wholesale customers | Level 4 - High |
Custom Reports
If your business does not have the technical knowledge to work with SQL or report builder tools, UltraCart Professional Services is available to write custom reports and publish them into your UltraCart account. Please contact support@ultracart.com with your report inquiries.
Google BigQuery tutorials
To learn more about using Google BigQuery, please see the following:
FAQ
What is a data warehouse?
A data warehouse is an enterprise grade system that is used for reporting and data analysis and is a core component of business intelligence. Data warehouses are the central repositories where data from one or more different sources are brought together for analysis. Often times these data stores contain both current and historical copies of data.
What is a BigQuery?
BigQuery is an enterprise data warehouse that solves the problem of operating the data warehouse by enabling super-fast SQL queries using the processing power of Google's infrastructure. Simply move your data into BigQuery and let Google handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.
How does this data warehouse work?
UltraCart tracks what changes are happening to your data and streams updates into different tables within BigQuery using secure communication with your Google cloud project.
What UltraCart plans support it?
All pricing plans have access to the data warehouse.
How often does data update?
The updates stream into the data warehouse with a 1-2 minute delay.
What data sets are supported?
At this time the objects that are pushed into your data warehouse are:
affiliates
affiliate clicks
affiliate ledger
auto orders
customers
gift certificates
items
orders
rotating transaction gateway history
screen recordings
storefront customers
storefront experiments
storefront upsell offers
storefront upsell offer events
storefront upsell paths
storefronts
towerdata email intelligence
What is the data model?
Each record within the BigQuery table is a hierarchy object that is provided to BigQuery as JSON data. It is almost identical to the REST API object model with a few difference covered below.
How does the data model different from the REST API?
All of the date time values on the records stored in BigQuery are in UTC. This differs from the typical EST that UltraCart runs in. The other change is that String[] type values have to be further nested to accommodate the way BigQuery loads data in.
What tools can connect to BigQuery?
There are a large variety of tools that can talk to BigQuery including:
SDKs in almost every popular language
Google Sheets
Google Data Studio
Microsoft PowerBI
Tableau
Qlik
Looker
Almost every popular analysis tool out there will have a connection to BigQuery.
What query language does BigQuery support?
BigQuery works with SQL as it’s query language. Documentation on their syntax is located here.
How does this data format differ from a traditional SQL database?
Traditional SQL databases do not operate on nested data within each row of the database. BigQuery fully supports nested hierarchical data within the table schema. If you’re used to SQL, but new to BigQuery make sure to look at the UNNEST function.
What use cases can a business solve with a data warehouse?
Build complex custom reports based upon your order data.
Aggregation of order data from linked accounts into a single database table.
Scheduled reports (using Google Data Studio)
Company dashboards (using Google Data Studio)
and more…
Are linked accounts supported?
Yes, the parent account will contain separate data sets which encompass all the data across all the linked accounts.
Can I join data within the UltraCart BigQuery project with another BigQuery project?
As long as your user has permission to the data within both projects, you can write a query that references the UltraCart hosted tables along side of your own project’s data. This authentication schema and capability is what makes BigQuery incredibly powerful!
What other data sets can you put in BigQuery to join up?
Common data sets that people drive into their BigQuery databases are:
Google Adwords spend
Facebook spend
What happens as the data model evolves?
If we add new fields to the data model which would change the schema, UltraCart will automatically recreate your BigQuery table instead of just performing an update.
What about other data warehouse software other than BigQuery?
At this time the data warehouse functionality is limited to BigQuery. If in the future another popular cloud native data warehouse software comes along we will consider an integration with it.
Enhancing Tables with Views
One of the powerful mechanisms of BigQuery is the ability to create a view. A view is a query of an existing table that then acts like a new table for the purposes of querying data. This is a powerful concept and one that everyone should learn. Views are not copies of your data so they are as live and reactive to changes as the base table.
There are two scenarios that you want to use a view for:
Flatten down the complex data into a simpler form for the purposes of reporting on in tools like Google Data Studio of Power BI.
Adding additional calculations into the base table for further analysis.
What 3rd Party Tools Are Merchants Using with BigQuery
These are some of the tools that we know merchants are using with BigQuery:
Google Data Studio - reporting
Microsoft Power BI - reporting
DBT - modeling
Make.com - ETL/integration
Hightouch - ETL/integration
Funnel - bring data into BigQuery
Looker - reporting
We find that almost all tools in the data warehouse / CDP / reporting space will talk to BigQuery.
Configuring a User to Run Queries Programmatically
In order to run queries programmatically:
Create your own Google project
Create a service account user within that project
Take the email for the service account user and add a user to UltraCart with that email, and select the appropriate data warehouse permissions in the user editor.
Once you have completed those steps. within 5 minutes the system should provision the new user with access to the UC hosted BigQuery project.
Sample Queries
Important Note: If you’re going to write SQL queries with the nested data that is in the BigQuery tables, you’ll want to learn CROSS JOIN UNNEST and other array operations documented in the BigQuery documentation.
Adding Calculations to Auto Orders
The following query shows how to add:
Count of the number of rebills
Life time value of the customer
Average order value of the customer
How long the customer was active before the auto order was canceled or disabled
You can run this query to see the results. Then save this query as a view named “uc_auto_orders_with_stats”.
SELECT
-- Count number of rebills
ARRAY_LENGTH(rebill_orders) as number_of_rebills,
-- Life time value by adding original order plus rebill values
COALESCE(original_order.summary.total.value, 0) - COALESCE(original_order.summary.total_refunded.value, 0) +
COALESCE((
select SUM(COALESCE(r.summary.total.value, 0) - COALESCE(r.summary.total_refunded.value, 0)) from UNNEST(rebill_orders) as r
), 0) as life_time_value,
-- Average order value by adding original order plus rebill values then dividing by total number of orders
ROUND(
COALESCE(original_order.summary.total.value, 0) - COALESCE(original_order.summary.total_refunded.value, 0) +
COALESCE((
select SUM(COALESCE(r.summary.total.value, 0) - COALESCE(r.summary.total_refunded.value, 0)) from UNNEST(rebill_orders) as r
), 0) / (1 + ARRAY_LENGTH(rebill_orders))
,2) as average_order_value,
-- How long did the auto order last? Value will be null if still active
DATE_DIFF(COALESCE(canceled_dts, disabled_dts), original_order.creation_dts, DAY) as duration_days,
-- Retain all the other columns on uc_auto_orders
*
FROM `my-data-warehouse.my_dataset.uc_auto_orders`
Adding EST time zone dates to orders
BigQuery is going to treat all dates as the UTC time zone by default. This can cause some challenges when performing certain types of reports. The following query can be used to create a view “uc_orders_est” which contains fields that are shifted into the EST time zone.
select
DATETIME(TIMESTAMP(o.creation_dts), "America/New_York") as creation_datetime_est,
DATETIME(TIMESTAMP(o.payment.payment_dts), "America/New_York") as payment_datetime_est,
DATETIME(TIMESTAMP(o.shipping.shipping_date), "America/New_York") as shipping_datetime_est,
DATETIME(TIMESTAMP(o.refund_dts), "America/New_York") as refund_datetime_est,
DATE(TIMESTAMP(o.creation_dts), "America/New_York") as creation_date_est,
DATE(TIMESTAMP(o.payment.payment_dts), "America/New_York") as payment_date_est,
DATE(TIMESTAMP(o.shipping.shipping_date), "America/New_York") as shipping_date_est,
DATE(TIMESTAMP(o.refund_dts), "America/New_York") as refund_date_est,
o.*
from `my-data-warehouse.my_dataset.uc_orders` as o
Active Auto Order Next Rebill
SELECT
auto_order_oid,
auto_order_code,
status,
enabled,
original_order.order_id as original_order_id,
orders.current_stage as original_order_current_stage,
original_order.billing.first_name,
original_order.billing.last_name,
original_order.billing.email,
original_order.shipping.day_phone_e164,
original_order.shipping.state_region,
original_order.shipping.postal_code,
original_order.shipping.country_code,
item_offset,
future_schedules.item_id as future_schedule_item_id,
future_schedules.shipment_dts as future_schedule_shipment_dts,
future_schedules.unit_cost as future_schedule_unit_cost,
future_schedules.rebill_count as future_schedule_rebill_count,
REGEXP_EXTRACT(failure_reason, '^[^\n]+') as failure_reason,
FROM `my-data-warehouse.my_dataset.uc_auto_orders` as auto_orders
JOIN `my-data-warehouse.my_dataset.uc_orders` as orders on orders.order_id = auto_orders.original_order_id,
UNNEST(auto_orders.items) as items WITH OFFSET AS item_offset,
UNNEST(items.future_schedules) as future_schedules WITH OFFSET AS future_schedule_offset
WHERE enabled = true
and future_schedule_offset = 0
and orders.payment.payment_dts is not null
and orders.payment.payment_method <> 'Quote Request'
and orders.current_stage <> 'Rejected'
and (items.no_order_after_dts is null or items.no_order_after_dts >= current_date())
ORDER BY future_schedules.shipment_dts ASC
Revenue Per Item Over a Certain Time Period
In this query we’re interacting with the order table. We want to join the items (which are a nested array) as another table. We use the CROSS JOIN UNNEST operation to do this. Once we have the tables joined it’s a pretty standard group by and sum over type query.
Query Table Size in GB
This query will show you the size of the tables you have in BigQuery.
Order ID Experiment Variations
This query will show which experiment and variation each order went through (this requires screen recordings).
Orders For Specific Items Shipped By a User on a Specific Date
Auto Orders with Pre-shipment Notice Date more than one week prior to shipment.
Orders with a Specific Custom Field Value
List All Upsell Paths, Variation and Upsells
Upsell Screenshot List
Auto Orders with Missed Pre-shipment Notices
Auto Order Logs Messages After Pre-shipment Notice and Before Shipment Should Take Place
Customer LTV Modeling
Find Order Id by Transaction Value
Auto Order Weekly Cohort by Main Item Id
Auto Order Future Predictions
Extract a Marketing List from Order History
Auto Order Churn Overall
Auto Order Churn by Item
Affiliate Metrics
Customer Cohort Revenue
Auto Order Cohort Revenue
Auto Order Cohort Active Count
Auto Order Cohort Active Percentage
Order Upsells
Which upsells were seen by an order and what were the outcomes?