If you've worked with standard relational databases, the GA4 BigQuery schema will feel unusual at first. GA4 exports each event as a single row — but instead of having a column for each event parameter, parameters are stored as a nested array of key-value pairs. This structure is more flexible and more efficient for storage, but it requires a different approach to querying than you're likely used to.

This post is structured as a reference — the schema first, then the query patterns, then the common gotchas. Bookmark it. You'll use it every time you write a new GA4 BigQuery query.

Top-level schema — the main columns

Every row in the events_YYYYMMDD table represents one event. These are the top-level columns that are directly accessible without UNNEST:

ColumnTypeWhat it contains
event_dateSTRINGDate the event was logged — format YYYYMMDD. Use PARSE_DATE('%Y%m%d', event_date) to convert to a DATE type for arithmetic.
event_timestampINTEGEREvent time in microseconds since Unix epoch. Divide by 1,000,000 for seconds, or use TIMESTAMP_MICROS(event_timestamp) to convert.
event_nameSTRINGThe event name — page_view, session_start, purchase, scroll, etc. This is your primary filter in almost every query.
event_paramsRECORD (REPEATED)Array of key-value pairs containing all event parameters. Must be UNNESTed to access. See the event_params section below.
event_previous_timestampINTEGERTimestamp of the previous event in the same session. Useful for calculating time between events.
event_bundle_sequence_idINTEGERSequential ID for the batch of events sent together. Useful for ordering events when timestamps are identical.
user_pseudo_idSTRINGPseudonymous identifier for the user — consistent per browser/device, resets if cookies are cleared. The primary user identifier in most queries.
user_idSTRINGYour own user identifier if set via gtag('set', 'user_id', ...). NULL if not implemented. Use this for cross-device analysis when available.
user_first_touch_timestampINTEGERTimestamp of the user's first ever event — useful for cohort analysis.
user_propertiesRECORD (REPEATED)User-scoped properties set via GA4. Same structure as event_params — requires UNNEST to access individual properties.
user_ltvRECORDLifetime value fields — user_ltv.revenue and user_ltv.currency. Directly accessible without UNNEST.
deviceRECORDDevice information. Access sub-fields directly: device.category, device.operating_system, device.web_info.browser.
geoRECORDGeographic data. Access as geo.country, geo.region, geo.city.
traffic_sourceRECORDSession acquisition source. Access as traffic_source.source, traffic_source.medium, traffic_source.name (campaign).
platformSTRINGPlatform the event was collected on — WEB, IOS, or ANDROID.
ecommerceRECORDEcommerce summary fields. Access as ecommerce.purchase_revenue, ecommerce.total_item_quantity. Available on purchase events.
itemsRECORD (REPEATED)Item-level ecommerce data. One entry per product — requires UNNEST for item-level analysis.
stream_idSTRINGThe data stream ID the event came from. Useful when a property has multiple streams.

event_params — the most important field

Every event's parameters — page URL, session ID, engagement time, custom parameters — are stored in the event_params column as a repeated record. Each entry in the array has two fields: key (a string) and value (a record with four sub-fields for different data types).

FieldTypeWhen to use it
event_params.keySTRINGThe parameter name — e.g. page_location, ga_session_id, engagement_time_msec
event_params.value.string_valueSTRINGUse for text parameters — page_location, page_title, campaign names, custom string params
event_params.value.int_valueINTEGERUse for integer parameters — ga_session_id, ga_session_number, engagement_time_msec
event_params.value.float_valueFLOATUse for float parameters — value (revenue) on some implementations
event_params.value.double_valueFLOAT64Use for double-precision float parameters — value (revenue) on most standard implementations
The value type trap: a given parameter always stores its value in one specific sub-field — but which one varies by parameter and sometimes by implementation. If value.string_value returns NULL, try value.int_value or value.double_value. The value field (revenue) on purchase events is almost always in double_value in standard GA4 implementations, but can be in float_value in some custom setups. Always verify with a sample query before building on top of it.

Essential query patterns — copy and adapt these

Pattern 01 Extract a single event parameter

The subquery SELECT pattern is the most common in all GA4 BigQuery work. Use it wherever you need a single parameter value from event_params.

-- Extract page_location from event_params
SELECT
  event_name,
  (SELECT value.string_value
   FROM UNNEST(event_params)
   WHERE key = 'page_location')  AS page_url,

  (SELECT value.int_value
   FROM UNNEST(event_params)
   WHERE key = 'ga_session_id')  AS session_id

FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
  DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
LIMIT 10
Pattern 02 Count sessions correctly

Sessions are not a column — they're derived by counting distinct combinations of user_pseudo_id and ga_session_id. Counting just user_pseudo_id gives users. Counting just event rows gives events. Counting distinct session IDs gives sessions.

-- Correct session count
SELECT
  COUNT(DISTINCT
    CONCAT(
      user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(event_params)
       WHERE key = 'ga_session_id')
    )
  ) AS sessions

FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
  DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
-- No event_name filter needed — every session has a session_start,
-- but filtering on session_start then counting distinct IDs also works
Pattern 03 Date filtering with _TABLE_SUFFIX

Always filter by _TABLE_SUFFIX to control which partitioned table files are scanned. Without this, BigQuery scans every table in the dataset — expensive at scale. The suffix is the YYYYMMDD string in the table name.

-- Yesterday only
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
  DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

-- Last 7 days
WHERE _TABLE_SUFFIX BETWEEN
  FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
  AND
  FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

-- Specific date range
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

-- From a specific date forward
WHERE _TABLE_SUFFIX >= '20240101'
Pattern 04 Purchase events and revenue

The purchase event has both event_params and the ecommerce record. Revenue is accessible via both paths — ecommerce.purchase_revenue is simpler, but event_params contains the value parameter which is what GA4's own reports use. Both should match; if they don't, investigate your purchase event implementation.

-- Daily revenue and transaction count
SELECT
  PARSE_DATE('%Y%m%d', event_date)    AS date,
  COUNT(*)                              AS transactions,
  SUM(ecommerce.purchase_revenue)       AS revenue,

  -- Also accessible via event_params
  SUM((SELECT value.double_value
       FROM UNNEST(event_params)
       WHERE key = 'value'))            AS revenue_via_params,

  COUNT(DISTINCT
    (SELECT value.string_value
     FROM UNNEST(event_params)
     WHERE key = 'transaction_id'))    AS unique_transactions

FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX >= '20240101'
  AND event_name = 'purchase'
GROUP BY date
ORDER BY date DESC
Pattern 05 Item-level ecommerce analysis

The items column is a repeated record — one entry per product in a transaction. To analyse at the item level, UNNEST items. This creates one row per item per event rather than one row per event.

-- Top products by revenue
SELECT
  item.item_id,
  item.item_name,
  item.item_category,
  SUM(item.quantity)                  AS units_sold,
  SUM(item.item_revenue)              AS revenue
FROM
  `project.analytics_ID.events_*`,
  UNNEST(items) AS item
WHERE
  _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
  AND event_name = 'purchase'
GROUP BY
  item.item_id, item.item_name, item.item_category
ORDER BY
  revenue DESC
LIMIT 50
Pattern 06 Session-level funnel analysis

To analyse which events occurred within each session — building a funnel — create a session-level CTE that aggregates the events that fired per session using COUNTIF or MAX with a boolean flag. This avoids the expensive cross-join approach.

-- Checkout funnel: how many sessions reached each step
WITH sessions AS (
  SELECT
    CONCAT(user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(event_params)
       WHERE key = 'ga_session_id'))      AS session_id,

    COUNTIF(event_name = 'view_item')     > 0  AS viewed_item,
    COUNTIF(event_name = 'add_to_cart')  > 0  AS added_to_cart,
    COUNTIF(event_name = 'begin_checkout') > 0  AS began_checkout,
    COUNTIF(event_name = 'purchase')      > 0  AS purchased

  FROM `project.analytics_ID.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
  GROUP BY session_id
)

SELECT
  COUNTIF(viewed_item)    AS sessions_viewed_item,
  COUNTIF(added_to_cart)  AS sessions_added_to_cart,
  COUNTIF(began_checkout) AS sessions_began_checkout,
  COUNTIF(purchased)      AS sessions_purchased
FROM sessions
Pattern 07 Traffic source and attribution

traffic_source contains session-level acquisition data — source, medium, and campaign. These are top-level record fields and don't require UNNEST. Note that traffic_source reflects the first session's attribution — for cross-session attribution analysis you need a more complex approach using the collected_traffic_source event parameter.

-- Sessions and revenue by channel
SELECT
  traffic_source.source,
  traffic_source.medium,
  traffic_source.name                     AS campaign,
  COUNT(DISTINCT CONCAT(user_pseudo_id, '.',
    (SELECT value.int_value
     FROM UNNEST(event_params)
     WHERE key = 'ga_session_id')))     AS sessions,
  SUM(IF(event_name = 'purchase',
    ecommerce.purchase_revenue, 0))      AS revenue

FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
GROUP BY 1, 2, 3
ORDER BY sessions DESC
Pattern 08 New vs returning users

The ga_session_number event parameter tells you which session number this is for the user — session number 1 means a new user in GA4's terminology. This is more reliable than using user_first_touch_timestamp for segmenting new vs returning.

-- New vs returning user sessions
SELECT
  CASE
    WHEN (SELECT value.int_value
          FROM UNNEST(event_params)
          WHERE key = 'ga_session_number') = 1
    THEN 'New'
    ELSE 'Returning'
  END                                     AS user_type,
  COUNT(DISTINCT CONCAT(user_pseudo_id, '.',
    (SELECT value.int_value
     FROM UNNEST(event_params)
     WHERE key = 'ga_session_id')))     AS sessions

FROM `project.analytics_ID.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d',
  DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  AND event_name = 'session_start'
GROUP BY user_type

Common gotchas — mistakes that produce silent wrong results

These mistakes don't cause query errors — they produce plausible-looking numbers that are quietly wrong.

Querying without a _TABLE_SUFFIX filter

Without a date filter, BigQuery scans all historical tables. The query runs (slowly and expensively) but returns correct data — it's a cost problem, not an accuracy problem. Still, always filter. On large datasets without partition pruning, a single unfiltered query can cost tens of dollars.

Using the wrong value sub-field

If you extract value.string_value for a parameter that stores its value in value.int_value, the result is NULL rather than an error. Always run a quick verification query first: SELECT event_params FROM ... LIMIT 1 and expand the array to see which value field contains the data you need.

Counting events instead of sessions

A query that counts rows where event_name = 'session_start' gives you session-start events, not sessions. Sessions require distinct session IDs. A user who has their session_start event fire twice in a session (rare but possible with certain GTM configurations) would be counted twice with a simple COUNT. The CONCAT + COUNT DISTINCT approach is the correct one.

Not accounting for duplicate purchase events

If your site has duplicate purchase event firing, BigQuery will contain the duplicates — it records exactly what GA4 received. Before summing revenue, check for duplicate transaction IDs: SELECT transaction_id, COUNT(*) FROM ... WHERE event_name = 'purchase' GROUP BY 1 HAVING COUNT(*) > 1. If you find duplicates, deduplicate using QUALIFY ROW_NUMBER() OVER (PARTITION BY transaction_id ORDER BY event_timestamp) = 1.

Confusing traffic_source with session-level source

traffic_source reflects the user's first session acquisition — not the current session's source. For current-session attribution, use the collected_traffic_source event parameter available on session_start events. This distinction matters for multi-session conversion analysis where you want to credit the session that drove the conversion, not the first-ever visit.

A useful starting query for any new property: run SELECT event_name, COUNT(*) as count FROM `project.analytics_ID.events_*` WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) GROUP BY event_name ORDER BY count DESC first. This gives you a complete inventory of events in the export and their relative frequency — essential context before writing any analysis query.
Clean data makes better queries. The patterns above work correctly when the underlying GA4 data is clean. Staging traffic, duplicate tags, and broken conversion events all export to BigQuery faithfully. GA4 Health Check audits your property before you build on top of it. Run the audit — $79 →
Travis Gunn
Founder of GA4 Health Check. Working with Google Analytics since 2013, with over 250 clients audited across almost every industry vertical. 100% Job Success on Upwork for over a decade.