The warehouse is where we store the raw data tables from your apps and databases in Grow. Your warehouse database keeps your data up-to-date and is the foundation datasets and metrics are built from.
In this article, we will cover:
- Warehousing Your Data
- Editing the Warehouse Table Selections
- Interpreting the Warehouse Table
- Using Warehoused Data in the Metric/Dataset Builders
- Sync Settings
Warehousing Your Data
There are two ways in which we can
-
Warehousing Data from a New Connection
You can warehouse data from a new connection. Refer to Once connected, you can select which tables of data you want to warehouse.
-
Warehousing Data from an Existing Connection
You can begin warehousing data from an existing connection by selecting a connection from the Connections Overview Page. From the Connection Details page, select the Manage Connection button inside the Connection Details section. This will open the warehouse table selection flow. Your data source is already connected, so you can immediately select which tables of data you want to warehouse.
Understanding Warehouse Tables
Once inside the Warehouse Table Selection flow, you are presented with a list of tables you can warehouse within Grow.
Each table has a corresponding toggle. Turning the toggle on indicates that the table is active and will be kept up-to-date on a designated refresh interval (see the Refresh Rates section below for more details). Turning the toggle off indicates that the table is no longer actively being refreshed; however, the data you have already synced up to this point will still be accessible throughout Grow.
These tables are broken into three categories: Suggested Tables, Other Tables, and Custom Tables.
-
Suggested Tables
For each data source (with the exception of databases), we have curated the most used tables of data that our customers use and put them at the top of the list. These are the tables we suggest you warehouse based on what we see hundreds of other customers using within Grow for that particular data source.
When warehousing a new connection, suggested tables are automatically toggled on for your convenience; however, you can toggle any of them off before beginning to store any data.
-
Other Tables
For some data sources, particularly CRMs with custom objects, we list all other predefined tables of data in this section.
Other tables are not automatically turned on and must be toggled on individually for these tables to be stored.
For Databases, all public schemas and their corresponding tables will appear in this section.
-
Custom Tables
For some data sources, there are alternative ways to pull data from that data source. Popular custom tables include:
- SQL queries against a database or data source that supports this kind of method (e.g., Salesforce SOQL)
- Customizable endpoints. (e.g., HubSpot Analytics for Sources which support multiple groupings and drill-down options)
When you have finished selecting the data that you want stored in Grow's Data Warehouse, click the Sync & Store button at the bottom.
This begins the initial population of your warehouse with the tables you defined. Depending on the data source, populating your data warehouse for the first time may take a while, even up to several hours.
Editing the Warehouse Table Selections
Once you have set up a warehouse, you can edit which tables are actively refreshing. Find the connection that is actively warehousing data and select the Manage Connection from the Connection Details section on the Connection Details Page. This reopens the Warehouse Table Selection flow where you can change which tables are toggled on or off and edit any custom tables you have set up.
Interpreting the Warehouse Table
For some data sources, there is a single toggle in the warehouse table selection flow that populates several tables in the warehouse. For instance, the Shopify Orders toggle populates several tables including Refunds, Order Tax Lines, and Order Line Items. These appear in the Warehoused Tables section on the Connections Detail page.
The Warehoused Tables section shows the Warehoused Parent Tables and each of their derived Warehouse Tables. We also indicate the # of Rows of data we are storing, the storage size of that data, when it was last updated, and its current refresh interval.
Using Warehoused Data in the Metric/Dataset Builders
Once the data is warehoused, you can use it in the Metric or Dataset Builders to make reports.
Selecting a Warehoused Table
To select your warehoused data in the Metric or Dataset Builder, add a new report and go to the Connection tab in the Add Data modal. Then select which data source your warehouse is built from.
Now you will be in the Metric or Dataset Builder, and you will finish selecting what data you want to use. If you have more than one connection to the chosen data source, select the connection from the drop-down list that your warehouse is built from.
In the Data section of the Data Settings, toggle to the Warehouse option instead of Direct Query. Select the warehoused report that you want to use from the drop-down menu.
Managing your Connection from Within the Builder
Select the text Manage Warehoused Reports. This opens the Warehouse Table selection flow. You can toggle on new warehoused tables from this flow. Once you toggle on a new connection, the initial population begins for that warehoused table. From the drop-down menu, you can see when the table begins processing. Once we have some data stored in the warehouse, you can pull it into the builder while the rest of the warehouse finishes populating.
If you sync more than one new table, you may have to wait until the first table is finished populating before subsequent tables begin populating. We suggest syncing and storing only one table at a time while in the building flow.
Sync Settings
Grow provides different options for synchronizing your data.
Full Sync
The default sync interval is 12 hours per table. The minimum interval available in the drop-down differs depending on the amount of data you have, the data source's API limitations, the number of tables you are warehousing for that particular data source, and the sync strategies available to each data source.
If you need to sync your data outside of the normal interval, you can also click the Full Sync Now button to manually sync your data.
Incremental Sync
For some tables, we have added the ability to incrementally sync your data in addition to having a full sync interval. Incremental syncs are a faster and more efficient syncing strategy which allows you to update only the modified data within your tables, while still periodically rebuilding your tables from scratch to ensure consistency between your source data and Grow.
Incremental syncing is turned on by default for any tables that have this feature enabled.
Customizing your Incremental and Full Sync Schedules
When full sync is the only available option, the default interval is every 12 hours. When incremental sync is enabled, the default incremental sync interval is 1 hour, while the default full sync interval is 1 week.
Here are the steps to update your Sync settings:
- Navigate to Data tab > Warehouse.
- Search and select the table you want to configure.
- Open Manage Sync Settings.
- Choose the desired Full Sync Interval from the dropdown.
- Select the Incremental Sync Interval you want from the dropdown.
This option is disabled for tables that do not have incremental sync available.
Click Save to complete the customization.
Customizing your Incremental and Full Sync Schedules for Database Data Sources
This applies to warehoused data sources classified as databases, such as Snowflake, SQL Server, Postgres, MySQL, and others.
Incremental sync is not enabled by default on these data sources, as Grow needs your input to know how to properly sync your data.
You can follow these steps to configure the sync settings:
- Navigate to the Warehouse section of your Data Library
- Open the table you want to configure.
- Select Manage Sync Settings.
- Toggle on the Incremental Sync option.
- Follow the instructions for selecting the corresponding Updated At, Created At, and Primary Key columns from your database table to help Grow configure incremental sync logic unique to your table.
Below is a description for how to set up each field.- Primary Key: Find the corresponding Primary Key column in your database table and select it from the drop-down (this is often the main Id field of that table, should be unique to each row of the table with no duplicates) the Primary Key field is used to identify and remove duplicates, leaving only the most recent version of your data for each row in the table.
- Created At: Find the corresponding Created At column in your database table and select it from the drop-down. This allows us to identify new rows since we last synced your data.
- Updated At: Find the corresponding Updated At column in your database table and select it from the drop-down this allows us to identify anything that's changed since we last synced your data.