The warehouse is where we store the raw data tables from your apps and databases in Grow. Your warehouse database keeps your data up-to-date and is the foundation datasets and metrics are built from.
Contents in This Article
- How to Warehouse Your Data
- Managing your Warehouse Connection
- Using Warehoused Data in the Dataset & Metric Builders
- Sync Settings
How to Warehouse Your Data
From a New Connection
You can warehouse data from a new connection by selecting the “+ Connection” button on the Connections Overview Page from the Data tab in Grow. This will take you to the New Connection Page, where you can select the data source you would like to warehouse. This will open the warehouse table selection flow, where you will need to enter your credentials to connect to the data source. Once connected, you can select which tables of data you want to warehouse.
From an Existing Connection
You can begin warehousing data from an existing connection by selecting a connection from the Connections Overview Page. From the Connection Details page, select the “Manage Connection” button inside the Connection Details section. This will open the warehouse table selection flow. Your data source is already connected, so you can immediately select which tables of data you want to warehouse.
Selecting Which Tables to Store
Once inside the Warehouse Table Selection flow, you will be presented with a list of tables you can warehouse within Grow.
Each table has a corresponding toggle. Turning the toggle on indicates that the table is active and will be kept up-to-date on a designated refresh interval (see the Refresh Rates section below for more details). Turning the toggle off indicates that the table is no longer actively being refreshed; however, the data you have already synced up to this point will still be accessible throughout Grow.
Additionally, these tables are broken into three categories (Suggested Tables, Other Tables, and Custom Tables) described below:
For each data source (with the exception of databases), we have curated the most used tables of data that our customers use and put them at the top of the list. These are the tables we suggest you warehouse based on what we see hundreds of other customers using within Grow for that particular data source.
When warehousing a new connection, suggested tables are automatically toggled on for your convenience; however, you can toggle any of them off before beginning to store any data.
For some data sources, particularly CRMs with custom objects, we list all other predefined tables of data in this section.
Other tables are not automatically turned on and must be toggled on individually for these tables to be stored.
For Databases, all public schemas and their corresponding tables will appear in this section.
For some data sources, there are alternative ways to pull data from that data source. Popular custom tables include:
- SQL queries against a database or data source that supports this kind of method (e.g., Salesforce SOQL)
- Customizable endpoints. (e.g., HubSpot Analytics for Sources which support multiple groupings and drill-down options)
Begin Storing the Data
When you have finished selecting the data that you want stored in Grow’s Data Warehouse, click the “Sync & Store” button at the bottom.
This will begin the initial population of your warehouse with the tables you defined. Depending on the data source, populating your data warehouse for the first time may take a while, even up to several hours.
Managing Your Warehouse Connection
Editing the Warehouse Table Selections
Once you’ve set up a warehouse, you can edit which tables are actively refreshing. Find the connection that is actively warehousing data and select the “Manage Connection” from the Connection Details section on the Connection Details Page. This will reopen the Warehouse Table Selection flow where you can change which tables are toggled on or off and edit any custom tables you have set up.
Interpreting the Warehouse Table
For some data sources, there is a single toggle in the warehouse table selection flow that populates several tables in the warehouse. For instance, the Shopify “Orders” toggle populates several tables including “Refunds”, “Order Tax Lines”, and “Order Line Items”. These appear in the “Warehoused Tables” section on the Connections Detail page.
The Warehoused Tables section shows the Warehoused Parent Tables and each of their derived Warehouse Tables. We also indicate the # of Rows of data we are storing, the storage size of that data, when it was last updated, and its current refresh interval.
Using Warehoused Data in the Dataset & Metric Builders
Selecting a Warehoused Table
To select your warehoused data in the Metric or Dataset Builder, add a new report and go to the Connection tab in the Add Data modal. Then select which data source your warehouse is built from.
Now you will be in the Metric or Dataset Builder, and you will finish selecting what data you want to use. If you have more than one connection to the chosen data source, select the connection from the drop-down list that your warehouse is built from.
In the Data section of the Data Settings, toggle to the "Warehouse" option instead of "Direct Query". Select the warehoused report that you want to use from the drop-down menu.
Managing your Connection from Within the Builder
Select the text “Manage Warehoused Reports”. This will open the Warehouse Table selection flow. You can toggle on new warehoused tables from this flow. Once you toggle on a new connection, the initial population will begin for that warehoused table. From the drop-down menu, you can see when the table begins processing. Once we have some data stored in the warehouse, you can pull it into the builder while the rest of the warehouse finishes populating.
Note: If you sync more than one new table, you may have to wait until the first table is finished populating before subsequent tables begin populating. We suggest syncing and storing only one table at a time while in the building flow.
The default sync interval is 12 hours per table. The minimum interval available in the drop-down will differ depending on the amount of data you have, the data source’s API limitations, the number of tables you are warehousing for that particular data source, and the sync strategies available to each data source. If you need to sync your data outside of the normal interval, you can also click the Full Sync Now button to manually sync your data.
For some tables, we have added the ability to incrementally sync your data in addition to having a full sync interval. Incremental syncs are a faster and more efficient syncing strategy which allows you to update only the modified data within your tables, while still periodically rebuilding your tables from scratch to ensure consistency between your source data and Grow. Incremental syncing is turned on by default for any tables that have this feature enabled.
How to Customize your Incremental and Full Sync Schedules
When full sync is the only available option, the default interval is every 12 hours. When incremental sync is enabled, the default incremental sync interval is 1 hour, while the default full sync interval is 1 week. You can update these settings by opening the Data Tab and navigating to the Warehouse section of your Data Library. Find the table you want to configure and open Manage Sync Settings. For tables that do not have incremental sync available, it will be disabled.
How to Customize your Incremental and Full Sync Schedules (for Database Data Sources)
Note: This applies to warehoused data sources classified as databases, such as Snowflake, SQL Server, Postgres, MySQL, and others.
Incremental sync is not enabled by default on these data sources, as Grow needs your input to know how to properly sync your data. To configure the sync settings, navigate to the Warehouse section of your Data Library, open the table you want to configure and open Manage Sync Settings. Toggle on the incremental sync option. Follow the instructions for selecting the corresponding "Updated At", "Created At", and "Primary Key" columns from your database table to help Grow configure incremental sync logic unique to your table. Below is a description for how to set up each field.
- Primary Key: Find the corresponding Primary Key column in your database table and select it from the drop-down (this is often the main Id field of that table, should be unique to each row of the table with no duplicates)—the Primary Key field is used to identify and remove duplicates, leaving only the most recent version of your data for each row in the table
- Created At: Find the corresponding Created At column in your database table and select it from the drop-down—this allows us to identify new rows since we last synced your data
- Updated At: Find the corresponding Updated At column in your database table and select it from the drop-down—this allows us to identify anything that's changed since we last synced your data