Data Warehouse (BigQuery)

Introduction

UltraCart’s Data Warehouse (BigQuery) feature allows for a business to quickly setup a data warehouse of their customers, items, orders and auto order information with Google’s BigQuery database service. UltraCart will handle the data transfer into BigQuery on an automated base. Businesses can then use the information to feed support systems, write reports, and join up with other data systems for operational analysis.

Additional Benefits:

  • easy reporting in Google Data Studio / Microsoft Power BI

  • connect data to Microsoft Excel or Google Sheets

  • extract querying using the BigQuery CLI or SDK to extract the data set

Providing Users Access to the Data Warehouse

ATTENTION:
The OWNER USER is the only user that can grant access to the Data Warehouse (BigQuery) integration.

In order for a user to access your data warehouse, the owner of the UltraCart account must grant the individual user permissions. The email associated with the user must be a Google account email. User management is done under Configuration → Account & Users → Users. The permissions that can be granted to the user are:

 

We recommend that you grant the minimum level of data access that is absolutely necessary for the individual user.

Google Project Location

For the location of your Data Warehouse (BigQuery) project in the Google Cloud navigate to:

Navigate to Configuration → Developer → Data Warehouse (BigQuery)

This page will provide you with the project URL, data set descriptions, etc.

Data Security

UltraCart utilizes Google Cloud IAM to limit access to the BigQuery project to only those users on the account that have been granted access to it. UltraCart also utilizes BigQuery column level security to restrict access to certain columns within various tables to specific levels of users. Whenever data is restricted, a SHA-256 hash of the value will be available in another column to be used in queries instead of the underlying PII data. Having this fine grained data access allows a business owner to provide limited access to the valuable corporate data without exposing customer lists, etc. We encourage owners to only grant the lowest level of access possible to achieve the necessary business result.

Data Sets & Streaming

Data warehouses rarely want to see data deleted, BigQuery is no exception, so a traditional strategy of inserts/deletes is not applicable with BigQuery. UltraCart streams the changes into BigQuery tables located in the ultracart_dw_streaming data set. Streaming allows for changes to appear within the data warehouse typically within less than 2 minutes.

Merchant’s should never attempt to query against the tables in the streaming data set. Streaming data sets contain one row per mutation of the object and will lead to unexpected results.

The other data sets are actually views that turn the streaming tables into a consistent snapshot of the data and also remove columns that are inaccessible due to data security at the column level.

Data Set

Description

Security Level

Data Set

Description

Security Level

ultracart_dw

All the tables without any sensitive information.

Level 1 - Standard Access

ultracart_dw_low

Affiliate information except tax id and software activation codes

Level 2 - Low

ultracart_dw_medium

PII information for customers

Level 3 - Medium

ultracart_dw_high

Tax IDs for affiliates and wholesale customers

Level 4 - High

For merchant using linked account, the parent account will also have a second set of data sets. These are the same security levels, but include information from ALL the linked accounts to make consolidated reporting possible.

Data Set

Description

Security Level

Data Set

Description

Security Level

ultracart_dw_linked

All the tables without any sensitive information.

Level 1 - Standard Access

ultracart_dw_linked_low

Affiliate information except tax id and software activation codes

Level 2 - Low

ultracart_dw_linked_medium

PII information for customers

Level 3 - Medium

ultracart_dw_linked_high

Tax IDs for affiliates and wholesale customers

Level 4 - High

Custom Reports

If your business does not have the technical knowledge to work with SQL or report builder tools, UltraCart Professional Services is available to write custom reports and publish them into your UltraCart account. Please contact support@ultracart.com with your report inquiries.

Google BigQuery tutorials

To learn more about using Google BigQuery, please see the following:

 

FAQ

What is a data warehouse?

A data warehouse is an enterprise grade system that is used for reporting and data analysis and is a core component of business intelligence. Data warehouses are the central repositories where data from one or more different sources are brought together for analysis. Often times these data stores contain both current and historical copies of data.

What is a BigQuery?

BigQuery is an enterprise data warehouse that solves the problem of operating the data warehouse by enabling super-fast SQL queries using the processing power of Google's infrastructure. Simply move your data into BigQuery and let Google handle the hard work. You can control access to both the project and your data based on your business needs, such as giving others the ability to view or query your data.

How does this data warehouse work?

UltraCart tracks what changes are happening to your data and streams updates into different tables within BigQuery using secure communication with your Google cloud project.

What UltraCart plans support it?

All pricing plans have access to the data warehouse.

How often does data update?

The updates stream into the data warehouse with a 1-2 minute delay.

What data sets are supported?

At this time the objects that are pushed into your data warehouse are:

  • affiliates

  • affiliate clicks

  • affiliate ledger

  • auto orders

  • customers

  • gift certificates

  • items

  • orders

  • rotating transaction gateway history

  • screen recordings

  • storefront customers

  • storefront experiments

  • storefront upsell offers

  • storefront upsell offer events

  • storefront upsell paths

  • storefronts

  • towerdata email intelligence

What is the data model?

Each record within the BigQuery table is a hierarchy object that is provided to BigQuery as JSON data. It is almost identical to the REST API object model with a few difference covered below.

How does the data model different from the REST API?

All of the date time values on the records stored in BigQuery are in UTC. This differs from the typical EST that UltraCart runs in. The other change is that String[] type values have to be further nested to accommodate the way BigQuery loads data in.

What tools can connect to BigQuery?

There are a large variety of tools that can talk to BigQuery including:

  • SDKs in almost every popular language

  • Google Sheets

  • Google Data Studio

  • Microsoft PowerBI

  • Tableau

  • Qlik

  • Looker

Almost every popular analysis tool out there will have a connection to BigQuery.

What query language does BigQuery support?

BigQuery works with SQL as it’s query language. Documentation on their syntax is located here.

How does this data format differ from a traditional SQL database?

Traditional SQL databases do not operate on nested data within each row of the database. BigQuery fully supports nested hierarchical data within the table schema. If you’re used to SQL, but new to BigQuery make sure to look at the UNNEST function.

What does this feature cost?

UltraCart covers the first $1.00 of expense associated with the BigQuery project hosting the account data. That is enough to cover all of the data storage expense and bandwidth associated with the updates that occur over time. The actual cost of BigQuery depends solely on the amount of queries that you execute against the environment. For most merchants they will only spend a few dollars a month on their data warehouse environment. UltraCart will once a month apply the previous month’s data warehouse usage to your bill.

The full pricing for BigQuery is located here. BigQuery is basically:

$0.02/GB of storage with the first 10GB free.
$5.00/TB of data processed during queries.

There are no hourly fees for having a BigQuery data set which makes it incredibly economical for more businesses.

What use cases can a business solve with a data warehouse?

  • Build complex custom reports based upon your order data.

  • Aggregation of order data from linked accounts into a single database table.

  • Scheduled reports (using Google Data Studio)

  • Company dashboards (using Google Data Studio)

  • and more…

Are linked accounts supported?

Yes, the parent account will contain separate data sets which encompass all the data across all the linked accounts.

Can I join data within the UltraCart BigQuery project with another BigQuery project?

As long as your user has permission to the data within both projects, you can write a query that references the UltraCart hosted tables along side of your own project’s data. This authentication schema and capability is what makes BigQuery incredibly powerful!

What other data sets can you put in BigQuery to join up?

Common data sets that people drive into their BigQuery databases are:

  • Google Adwords spend

  • Facebook spend

What happens as the data model evolves?

If we add new fields to the data model which would change the schema, UltraCart will automatically recreate your BigQuery table instead of just performing an update.

What about other data warehouse software other than BigQuery?

At this time the data warehouse functionality is limited to BigQuery. If in the future another popular cloud native data warehouse software comes along we will consider an integration with it.

Enhancing Tables with Views

One of the powerful mechanisms of BigQuery is the ability to create a view. A view is a query of an existing table that then acts like a new table for the purposes of querying data. This is a powerful concept and one that everyone should learn. Views are not copies of your data so they are as live and reactive to changes as the base table.

There are two scenarios that you want to use a view for:

  1. Flatten down the complex data into a simpler form for the purposes of reporting on in tools like Google Data Studio of Power BI.

  2. Adding additional calculations into the base table for further analysis.

What 3rd Party Tools Are Merchants Using with BigQuery

These are some of the tools that we know merchants are using with BigQuery:

We find that almost all tools in the data warehouse / CDP / reporting space will talk to BigQuery.

Configuring a User to Run Queries Programmatically

In order to run queries programmatically:

  1. Create your own Google project

  2. Create a service account user within that project

  3. Take the email for the service account user and add a user to UltraCart with that email, and select the appropriate data warehouse permissions in the user editor.

Once you have completed those steps. within 5 minutes the system should provision the new user with access to the UC hosted BigQuery project.

Sample Queries

Important Note: If you’re going to write SQL queries with the nested data that is in the BigQuery tables, you’ll want to learn CROSS JOIN UNNEST and other array operations documented in the BigQuery documentation.

Adding Calculations to Auto Orders

The following query shows how to add:

  • Count of the number of rebills

  • Life time value of the customer

  • Average order value of the customer

  • How long the customer was active before the auto order was canceled or disabled

You can run this query to see the results. Then save this query as a view named “uc_auto_orders_with_stats”.

SELECT -- Count number of rebills ARRAY_LENGTH(rebill_orders) as number_of_rebills, -- Life time value by adding original order plus rebill values COALESCE(original_order.summary.total.value, 0) - COALESCE(original_order.summary.total_refunded.value, 0) + COALESCE(( select SUM(COALESCE(r.summary.total.value, 0) - COALESCE(r.summary.total_refunded.value, 0)) from UNNEST(rebill_orders) as r ), 0) as life_time_value, -- Average order value by adding original order plus rebill values then dividing by total number of orders ROUND( COALESCE(original_order.summary.total.value, 0) - COALESCE(original_order.summary.total_refunded.value, 0) + COALESCE(( select SUM(COALESCE(r.summary.total.value, 0) - COALESCE(r.summary.total_refunded.value, 0)) from UNNEST(rebill_orders) as r ), 0) / (1 + ARRAY_LENGTH(rebill_orders)) ,2) as average_order_value, -- How long did the auto order last? Value will be null if still active DATE_DIFF(COALESCE(canceled_dts, disabled_dts), original_order.creation_dts, DAY) as duration_days, -- Retain all the other columns on uc_auto_orders * FROM `my-data-warehouse.my_dataset.uc_auto_orders`

Adding EST time zone dates to orders

BigQuery is going to treat all dates as the UTC time zone by default. This can cause some challenges when performing certain types of reports. The following query can be used to create a view “uc_orders_est” which contains fields that are shifted into the EST time zone.

select DATETIME(TIMESTAMP(o.creation_dts), "America/New_York") as creation_datetime_est, DATETIME(TIMESTAMP(o.payment.payment_dts), "America/New_York") as payment_datetime_est, DATETIME(TIMESTAMP(o.shipping.shipping_date), "America/New_York") as shipping_datetime_est, DATETIME(TIMESTAMP(o.refund_dts), "America/New_York") as refund_datetime_est, DATE(TIMESTAMP(o.creation_dts), "America/New_York") as creation_date_est, DATE(TIMESTAMP(o.payment.payment_dts), "America/New_York") as payment_date_est, DATE(TIMESTAMP(o.shipping.shipping_date), "America/New_York") as shipping_date_est, DATE(TIMESTAMP(o.refund_dts), "America/New_York") as refund_date_est, o.* from `my-data-warehouse.my_dataset.uc_orders` as o

Active Auto Order Next Rebill

SELECT auto_order_oid, auto_order_code, status, enabled, original_order.order_id as original_order_id, orders.current_stage as original_order_current_stage, original_order.billing.first_name, original_order.billing.last_name, original_order.billing.email, original_order.shipping.day_phone_e164, original_order.shipping.state_region, original_order.shipping.postal_code, original_order.shipping.country_code, item_offset, future_schedules.item_id as future_schedule_item_id, future_schedules.shipment_dts as future_schedule_shipment_dts, future_schedules.unit_cost as future_schedule_unit_cost, future_schedules.rebill_count as future_schedule_rebill_count, REGEXP_EXTRACT(failure_reason, '^[^\n]+') as failure_reason, FROM `my-data-warehouse.my_dataset.uc_auto_orders` as auto_orders JOIN `my-data-warehouse.my_dataset.uc_orders` as orders on orders.order_id = auto_orders.original_order_id, UNNEST(auto_orders.items) as items WITH OFFSET AS item_offset, UNNEST(items.future_schedules) as future_schedules WITH OFFSET AS future_schedule_offset WHERE enabled = true and future_schedule_offset = 0 and orders.payment.payment_dts is not null and orders.payment.payment_method <> 'Quote Request' and orders.current_stage <> 'Rejected' and (items.no_order_after_dts is null or items.no_order_after_dts >= current_date()) ORDER BY future_schedules.shipment_dts ASC

Revenue Per Item Over a Certain Time Period

In this query we’re interacting with the order table. We want to join the items (which are a nested array) as another table. We use the CROSS JOIN UNNEST operation to do this. Once we have the tables joined it’s a pretty standard group by and sum over type query.

Query Table Size in GB

This query will show you the size of the tables you have in BigQuery.

Order ID Experiment Variations

This query will show which experiment and variation each order went through (this requires screen recordings).

Orders For Specific Items Shipped By a User on a Specific Date

Auto Orders with Pre-shipment Notice Date more than one week prior to shipment.

Orders with a Specific Custom Field Value

List All Upsell Paths, Variation and Upsells

Upsell Screenshot List

Auto Orders with Missed Pre-shipment Notices

Auto Order Logs Messages After Pre-shipment Notice and Before Shipment Should Take Place

Customer LTV Modeling

Find Order Id by Transaction Value

Auto Order Weekly Cohort by Main Item Id

 

Auto Order Future Predictions

 

Extract a Marketing List from Order History

Auto Order Churn Overall

Auto Order Churn by Item

Affiliate Metrics

Customer Cohort Revenue

Auto Order Cohort Revenue

Auto Order Cohort Active Count

Auto Order Cohort Active Percentage

Order Upsells

Which upsells were seen by an order and what were the outcomes?

 

Affiliate Click ↔︎ Order Metrics

All UTMs for an Order

Projected Future Revenue Based Upon Initial Item Purchased (Non-Auto Orders)

UTM Sales By Week

UTM Clicks and Weighted Revenue

StoreFront Experiment Statistics

Count of Apple Pay/Google Pay/Microsoft Pay Transactions

New Customers in a Specific Date Range By Email

Conversion Rate from Analytics Sessions

Auto Order Percentages with CC vs PayPal 30 Day Moving Averages

Coupon Usage Summary

Coupon Usage Detail

Order Count and Revenue by Hour of Day

Finding Expensive Queries in a Project

Wholesale Customer Profile LTV

Inventory Value at Start Of Each Month

Auto Order LTV calculation

Related Documentation

https://towardsdatascience.com/how-to-monitor-repeat-purchase-behavior-using-google-data-studio-and-bigquery-c2b5adbe8ebb