Controlling Costs in ZettaBlock
This page describes best practices for controlling costs in ZettaBlock.
Within ZettaBlock, there are three areas to optimize performance and costs. They are:
- Query Builder optimizations
- API Builder optimizations
- Other optimizations
Optimizing Query Builder compute units
In the Query Builder, compute units correspond to the total amount of data scanned and the time each query takes to fully execute.
ZettaBlock’s data lake uses columnar storage formats. Queries should be written to take advantage of columnar formats to reduce the total amount of data scanned.
ZettaBlock uses Presto query engine. Queries should be optimized for Presto.
Take advantage of partitions
Best Practice: Filter query results on partitioned columns (almost always data_creation_date).
Limiting the query scope by partition is the most effective way to limit data scanned. Instead of querying the entire table, our query engine will now only scan data that matches the filter.
Within ZettaBlock, tables are partitioned by data_creation_date.
Column selection
Best Practice: Only query columns you need.
Reduce data scanned by only selecting columns you need. ZettaBlock’s data is splittable, allowing you to query only a small subset of the total columns. Avoid SELECT *, which will query all columns in a table.
Prefer smaller tables
Best Practice: Query the smallest table that meets your requirements.
Selecting smaller tables will reduce the total amount of data scanned. The largest (and thus most expensive) tables are trace tables.
To lower compute costs, prefer the smallest table that surfaces the data you need. Don’t use traces if logs will suffice. Likewise, if you only require transaction level data, prefer transaction tables over logs or traces.
Make use of abstraction tables
Best Practice: Leverage abstraction tables to save both engineering time and compute costs.
ZettaBlock provides a number of abstraction tables for common query patterns. DEX and NFT tables are two examples.
Abstraction tables can avoid selecting large trace tables to surface events. This not only saves engineering time, but also will significantly lower compute units.
Move repeated queries to materialized view via API Builder
Best Practice: Materialize query results via ZettaBlock’s API Builder.
Materialized views are precomputed results. Leveraging materialized views dramatically saves query costs because the query no longer has to scan the entire history.
This tactic is especially useful for aggregate metrics, such as total number of NFTs minted by day.
API Builder comes with incremental data refresh, allowing you to update the underlying data to your specific needs.
Leverage Data Preview
Best Practice: Explore tables via ZettaBlock’s data preview feature.
You do not need to run queries for data exploration. ZettaBlock provides data preview for all tables inside ZettaBlock.
While queries will incur compute units, table previews will not.
Optimize SQL Operations
Finally, there are a few ways you can optimize ORDER BY
- Performance Issue: By default, Presto will execute ORDER BY with one worker. This will increase not only the amount of memory required, but also the query execution time.
- Best Practice: Use ORDER BY with LIMIT. This will move the sorting and limiting to individual workers, instead of putting the pressure of all the sorting on a single worker.
Optimizing API Builder compute units
Use Incremental Refresh
Best Practice: Enable incremental refresh when possible.
Incremental refresh preserves previously computed data while updating the view.
Without incremental refresh, the entire table would need to be rebuilt. With incremental refresh, only missing data or changed data is scanned and updated.
Total compute unit costs for maintaining a table can be lowered by 99.99%. You can learn more about incremental refresh best practices here.
Prefer longer data refresh intervals
Best Practice: Select the longest data refresh interval that meets requirements.
When an API refreshes, the API Transformation Code (or Incremental SQL Code for APIs with incremental refresh enabled) is executed. Shorter data refresh intervals will trigger more query executions and result in higher cost.
To lower the amount of data scanned, select the longest data refresh interval that meets your requirements.
Choose reasonable dataset size
Best Practice: Keep the resulting dataset size under 10 GB, and keep the number of records under 10 million for best performance.
Other compute unit optimizations
Prune unused queries
Best Practice: Delete unused queries.
Typically, you will build APIs with automatic data freshness. Each interval, a query will be made to append the latest data.
Deleting unused queries saves compute costs by eliminating unneeded queries.
Optimizing Query Builder compute units for private data
Best Practice: Use Zetta dbt repo to upload data into ZettaBlock
ZettaBlock’s platform allows you to unify your data.
If you take advantage of our dbt repo, we will optimize your data automatically.
However if you are uploading data in other ways, we recommend you take the following optimization to take advantage of cost control optimizations:
Convert data into columnar format
Partition data by date
Compress data in splittable format (we recommend Apache Parquet format).
Select the right plan size
Best Practice: Pre-buy compute units needed to lower the cost per compute unit.
Selecting the correct usage plan will present significant cost reductions.
ZettaBlock’s pricing is value oriented, based on use. While the rest of this page documents various ways to reduce compute units, you can also lower the cost per compute unit by selecting the correct plan size.
Over usage compute units are charged at full price and not discounted. If you find yourself exceeding your plan limits, consider increasing the number of pre-bought compute units.
Other helpful links
Updated 11 months ago