🔄 ETL Workflow (Extract, Transform, Load)

A robust ETL (Extract, Transform, Load) process is key to effective data visualization. Here's how I approach it in practice mainly focusing on Observable Plot:

1. Extract

Data visualization begins with discovering and capturing data. Depending on the scale of the project, we devise ways to efficiently obtain the necessary data with as little budget as possible (preferably free!). It is important to identify what the next "Transform" process can do and accurately narrow down the data that will be core to next data processing.

2. Transform

The basic format is flat, but a slight nesting structure can also be used when performing faceted or group-based visualization.

When using BigQuery, estimate the query cost before executing it. Avoid using SELECT \* and specify only the necessary columns to reduce costs.

Tip: Distinguish between "" (no data) and "-"/"N/A"/"None" (explicit "none" choice). This can influence parsing and filtering logic.

ETL is iterative — expect to refactor often. Use Observable’s REPL nature to quickly prototype, visualize, and refine 🚀

3. Load

As a general rule, preferred data format is unpivoted tidy format (long format). In the case of Observable, this means a flat (non-nested) JavaScript array.

Observable Plot recommends the long format, but the "wide format" with multiple series is also supported for some marks (such as lineY and areaY). Choose the appropriate format.