If you've worked with standard relational databases, the GA4 BigQuery schema will feel unusual at first. GA4 exports each event as a single row — but instead of having a column for each event parameter, parameters are stored as a nested array of key-value pairs. This structure is more flexible and more efficient for storage, but it requires a different approach to querying than you're likely used to.
This post is structured as a reference — the schema first, then the query patterns, then the common gotchas. Bookmark it. You'll use it every time you write a new GA4 BigQuery query.
Top-level schema — the main columns
Every row in the events_YYYYMMDD table represents one event. These are the top-level columns that are directly accessible without UNNEST:
| Column | Type | What it contains |
|---|---|---|
| event_date | STRING | Date the event was logged — format YYYYMMDD. Use PARSE_DATE('%Y%m%d', event_date) to convert to a DATE type for arithmetic. |
| event_timestamp | INTEGER | Event time in microseconds since Unix epoch. Divide by 1,000,000 for seconds, or use TIMESTAMP_MICROS(event_timestamp) to convert. |
| event_name | STRING | The event name — page_view, session_start, purchase, scroll, etc. This is your primary filter in almost every query. |
| event_params | RECORD (REPEATED) | Array of key-value pairs containing all event parameters. Must be UNNESTed to access. See the event_params section below. |
| event_previous_timestamp | INTEGER | Timestamp of the previous event in the same session. Useful for calculating time between events. |
| event_bundle_sequence_id | INTEGER | Sequential ID for the batch of events sent together. Useful for ordering events when timestamps are identical. |
| user_pseudo_id | STRING | Pseudonymous identifier for the user — consistent per browser/device, resets if cookies are cleared. The primary user identifier in most queries. |
| user_id | STRING | Your own user identifier if set via gtag('set', 'user_id', ...). NULL if not implemented. Use this for cross-device analysis when available. |
| user_first_touch_timestamp | INTEGER | Timestamp of the user's first ever event — useful for cohort analysis. |
| user_properties | RECORD (REPEATED) | User-scoped properties set via GA4. Same structure as event_params — requires UNNEST to access individual properties. |
| user_ltv | RECORD | Lifetime value fields — user_ltv.revenue and user_ltv.currency. Directly accessible without UNNEST. |
| device | RECORD | Device information. Access sub-fields directly: device.category, device.operating_system, device.web_info.browser. |
| geo | RECORD | Geographic data. Access as geo.country, geo.region, geo.city. |
| traffic_source | RECORD | Session acquisition source. Access as traffic_source.source, traffic_source.medium, traffic_source.name (campaign). |
| platform | STRING | Platform the event was collected on — WEB, IOS, or ANDROID. |
| ecommerce | RECORD | Ecommerce summary fields. Access as ecommerce.purchase_revenue, ecommerce.total_item_quantity. Available on purchase events. |
| items | RECORD (REPEATED) | Item-level ecommerce data. One entry per product — requires UNNEST for item-level analysis. |
| stream_id | STRING | The data stream ID the event came from. Useful when a property has multiple streams. |
event_params — the most important field
Every event's parameters — page URL, session ID, engagement time, custom parameters — are stored in the event_params column as a repeated record. Each entry in the array has two fields: key (a string) and value (a record with four sub-fields for different data types).
| Field | Type | When to use it |
|---|---|---|
| event_params.key | STRING | The parameter name — e.g. page_location, ga_session_id, engagement_time_msec |
| event_params.value.string_value | STRING | Use for text parameters — page_location, page_title, campaign names, custom string params |
| event_params.value.int_value | INTEGER | Use for integer parameters — ga_session_id, ga_session_number, engagement_time_msec |
| event_params.value.float_value | FLOAT | Use for float parameters — value (revenue) on some implementations |
| event_params.value.double_value | FLOAT64 | Use for double-precision float parameters — value (revenue) on most standard implementations |
value.string_value returns NULL, try value.int_value or value.double_value. The value field (revenue) on purchase events is almost always in double_value in standard GA4 implementations, but can be in float_value in some custom setups. Always verify with a sample query before building on top of it.Essential query patterns — copy and adapt these
The subquery SELECT pattern is the most common in all GA4 BigQuery work. Use it wherever you need a single parameter value from event_params.
-- Extract page_location from event_params
SELECT
event_name,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_url,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id
FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
LIMIT 10
Sessions are not a column — they're derived by counting distinct combinations of user_pseudo_id and ga_session_id. Counting just user_pseudo_id gives users. Counting just event rows gives events. Counting distinct session IDs gives sessions.
-- Correct session count
SELECT
COUNT(DISTINCT
CONCAT(
user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id')
)
) AS sessions
FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
-- No event_name filter needed — every session has a session_start,
-- but filtering on session_start then counting distinct IDs also works
Always filter by _TABLE_SUFFIX to control which partitioned table files are scanned. Without this, BigQuery scans every table in the dataset — expensive at scale. The suffix is the YYYYMMDD string in the table name.
-- Yesterday only
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
-- Last 7 days
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
-- Specific date range
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
-- From a specific date forward
WHERE _TABLE_SUFFIX >= '20240101'
The purchase event has both event_params and the ecommerce record. Revenue is accessible via both paths — ecommerce.purchase_revenue is simpler, but event_params contains the value parameter which is what GA4's own reports use. Both should match; if they don't, investigate your purchase event implementation.
-- Daily revenue and transaction count
SELECT
PARSE_DATE('%Y%m%d', event_date) AS date,
COUNT(*) AS transactions,
SUM(ecommerce.purchase_revenue) AS revenue,
-- Also accessible via event_params
SUM((SELECT value.double_value
FROM UNNEST(event_params)
WHERE key = 'value')) AS revenue_via_params,
COUNT(DISTINCT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'transaction_id')) AS unique_transactions
FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX >= '20240101'
AND event_name = 'purchase'
GROUP BY date
ORDER BY date DESC
The items column is a repeated record — one entry per product in a transaction. To analyse at the item level, UNNEST items. This creates one row per item per event rather than one row per event.
-- Top products by revenue
SELECT
item.item_id,
item.item_name,
item.item_category,
SUM(item.quantity) AS units_sold,
SUM(item.item_revenue) AS revenue
FROM
`project.analytics_ID.events_*`,
UNNEST(items) AS item
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
AND event_name = 'purchase'
GROUP BY
item.item_id, item.item_name, item.item_category
ORDER BY
revenue DESC
LIMIT 50
To analyse which events occurred within each session — building a funnel — create a session-level CTE that aggregates the events that fired per session using COUNTIF or MAX with a boolean flag. This avoids the expensive cross-join approach.
-- Checkout funnel: how many sessions reached each step
WITH sessions AS (
SELECT
CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id')) AS session_id,
COUNTIF(event_name = 'view_item') > 0 AS viewed_item,
COUNTIF(event_name = 'add_to_cart') > 0 AS added_to_cart,
COUNTIF(event_name = 'begin_checkout') > 0 AS began_checkout,
COUNTIF(event_name = 'purchase') > 0 AS purchased
FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY session_id
)
SELECT
COUNTIF(viewed_item) AS sessions_viewed_item,
COUNTIF(added_to_cart) AS sessions_added_to_cart,
COUNTIF(began_checkout) AS sessions_began_checkout,
COUNTIF(purchased) AS sessions_purchased
FROM sessions
traffic_source contains session-level acquisition data — source, medium, and campaign. These are top-level record fields and don't require UNNEST. Note that traffic_source reflects the first session's attribution — for cross-session attribution analysis you need a more complex approach using the collected_traffic_source event parameter.
-- Sessions and revenue by channel
SELECT
traffic_source.source,
traffic_source.medium,
traffic_source.name AS campaign,
COUNT(DISTINCT CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))) AS sessions,
SUM(IF(event_name = 'purchase',
ecommerce.purchase_revenue, 0)) AS revenue
FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY 1, 2, 3
ORDER BY sessions DESC
The ga_session_number event parameter tells you which session number this is for the user — session number 1 means a new user in GA4's terminology. This is more reliable than using user_first_touch_timestamp for segmenting new vs returning.
-- New vs returning user sessions
SELECT
CASE
WHEN (SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_number') = 1
THEN 'New'
ELSE 'Returning'
END AS user_type,
COUNT(DISTINCT CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))) AS sessions
FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_name = 'session_start'
GROUP BY user_type
Common gotchas — mistakes that produce silent wrong results
These mistakes don't cause query errors — they produce plausible-looking numbers that are quietly wrong.
Querying without a _TABLE_SUFFIX filter
Without a date filter, BigQuery scans all historical tables. The query runs (slowly and expensively) but returns correct data — it's a cost problem, not an accuracy problem. Still, always filter. On large datasets without partition pruning, a single unfiltered query can cost tens of dollars.
Using the wrong value sub-field
If you extract value.string_value for a parameter that stores its value in value.int_value, the result is NULL rather than an error. Always run a quick verification query first: SELECT event_params FROM ... LIMIT 1 and expand the array to see which value field contains the data you need.
Counting events instead of sessions
A query that counts rows where event_name = 'session_start' gives you session-start events, not sessions. Sessions require distinct session IDs. A user who has their session_start event fire twice in a session (rare but possible with certain GTM configurations) would be counted twice with a simple COUNT. The CONCAT + COUNT DISTINCT approach is the correct one.
Not accounting for duplicate purchase events
If your site has duplicate purchase event firing, BigQuery will contain the duplicates — it records exactly what GA4 received. Before summing revenue, check for duplicate transaction IDs: SELECT transaction_id, COUNT(*) FROM ... WHERE event_name = 'purchase' GROUP BY 1 HAVING COUNT(*) > 1. If you find duplicates, deduplicate using QUALIFY ROW_NUMBER() OVER (PARTITION BY transaction_id ORDER BY event_timestamp) = 1.
Confusing traffic_source with session-level source
traffic_source reflects the user's first session acquisition — not the current session's source. For current-session attribution, use the collected_traffic_source event parameter available on session_start events. This distinction matters for multi-session conversion analysis where you want to credit the session that drove the conversion, not the first-ever visit.
SELECT event_name, COUNT(*) as count FROM `project.analytics_ID.events_*` WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) GROUP BY event_name ORDER BY count DESC first. This gives you a complete inventory of events in the export and their relative frequency — essential context before writing any analysis query.