Standard GA4 reporting for content sites gives you pageviews, sessions, and average engagement time — useful baselines but blunt instruments for editorial decision-making. The questions content teams actually need to answer require combining multiple signals, filtering to specific URL patterns, and applying logic that GA4's interface can't express. BigQuery makes all of this possible.

The queries below assume you have the GA4 BigQuery export running and a basic familiarity with the schema. Replace project.analytics_PROPERTYID with your actual project and property ID throughout. For schema fundamentals, see the schema reference post.

URL cleaning is essential for content queries. Every query that groups by article URL needs to strip query parameters and trailing slashes, otherwise the same article appears as dozens of rows. A standard cleaning pattern appears throughout these queries — adapt the domain and URL structure to match your site.

Article performance — the full picture

Query 01
Article performance scorecard
Sessions, unique readers, average engagement time, scroll depth, and return visit rate per article — the metrics that together tell you whether an article is earning attention or just getting clicks.

This is the foundational content query. Engagement time comes from the engagement_time_msec parameter on user_engagement events. Scroll depth requires the scroll event — GA4's enhanced measurement fires it automatically at 90% scroll. Return readers are sessions where ga_session_number > 1.

WITH page_data AS (
  SELECT
    -- Clean URL: strip domain, query params, trailing slash
    REGEXP_REPLACE(
      REGEXP_REPLACE(
        REGEXP_EXTRACT(
          (SELECT value.string_value
           FROM UNNEST(event_params)
           WHERE key = 'page_location'),
          r'https?://[^/]+(.*)'
        ), r'\?.*$', ''),
      r'/$', ''
    )                                           AS page_path,

    (SELECT value.string_value
     FROM UNNEST(event_params)
     WHERE key = 'page_title')              AS page_title,

    CONCAT(user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(event_params)
       WHERE key = 'ga_session_id'))        AS session_id,

    user_pseudo_id,
    event_name,

    -- Engagement time in seconds
    (SELECT value.int_value
     FROM UNNEST(event_params)
     WHERE key = 'engagement_time_msec') / 1000  AS engagement_sec,

    -- Is this a returning reader?
    (SELECT value.int_value
     FROM UNNEST(event_params)
     WHERE key = 'ga_session_number') > 1    AS is_returning

  FROM `project.analytics_PROPERTYID.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
    AND event_name IN ('page_view', 'user_engagement', 'scroll')
)

SELECT
  page_path,
  APPROX_TOP_COUNT(page_title, 1)[OFFSET(0)].value   AS title,
  COUNT(DISTINCT session_id)                         AS sessions,
  COUNT(DISTINCT user_pseudo_id)                     AS unique_readers,
  ROUND(SUM(engagement_sec) / COUNT(DISTINCT session_id), 0)
                                                      AS avg_engagement_sec,
  ROUND(COUNTIF(event_name = 'scroll') * 100.0
    / COUNT(DISTINCT session_id), 1)              AS scroll_90_pct,
  ROUND(COUNTIF(is_returning) * 100.0
    / COUNT(DISTINCT session_id), 1)              AS return_reader_pct

FROM page_data
WHERE
  page_path IS NOT NULL
  AND REGEXP_CONTAINS(page_path, r'^/blog/')  -- filter to articles
GROUP BY page_path
HAVING sessions >= 10
ORDER BY sessions DESC
LIMIT 100

Referral source quality — who sends real readers

Query 02
Referral source quality by engagement
Not all traffic sources are equal. Social traffic often arrives, glances, and leaves. Organic search traffic often arrives with intent and reads. Newsletter traffic often has the highest engagement of all. This query ranks your referral sources by engagement quality, not just volume.
WITH sessions AS (
  SELECT
    CONCAT(user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(event_params)
       WHERE key = 'ga_session_id'))      AS session_id,
    traffic_source.source,
    traffic_source.medium,
    event_name,
    (SELECT value.int_value
     FROM UNNEST(event_params)
     WHERE key = 'engagement_time_msec') / 1000  AS engagement_sec

  FROM `project.analytics_PROPERTYID.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
    AND event_name IN ('session_start', 'user_engagement')
)

SELECT
  source,
  medium,
  COUNT(DISTINCT session_id)                     AS sessions,
  ROUND(SUM(engagement_sec)
    / COUNT(DISTINCT session_id), 0)             AS avg_engagement_sec,
  ROUND(COUNTIF(event_name = 'user_engagement') * 100.0
    / COUNT(DISTINCT session_id), 1)            AS engaged_session_pct

FROM sessions
GROUP BY source, medium
HAVING sessions >= 50
ORDER BY avg_engagement_sec DESC

Content longevity — evergreen vs spike

Query 03
Evergreen content identification
Articles that get consistent traffic week over week are structurally more valuable than ones that spike and die. This query calculates traffic consistency — the ratio of minimum weekly sessions to maximum weekly sessions — as a proxy for evergreen value.

A consistency score close to 1.0 means the article gets similar traffic every week — classic evergreen behaviour. A score close to 0 means most of the traffic came in one spike week. Combine this with total sessions to find articles that are both high-traffic and consistent.

WITH weekly AS (
  SELECT
    DATE_TRUNC(
      PARSE_DATE('%Y%m%d', event_date), WEEK)   AS week,

    REGEXP_REPLACE(REGEXP_REPLACE(
      REGEXP_EXTRACT(
        (SELECT value.string_value
         FROM UNNEST(event_params)
         WHERE key = 'page_location'),
        r'https?://[^/]+(.*)'),
      r'\?.*$', ''), r'/$', '')             AS page_path,

    COUNT(DISTINCT CONCAT(user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(event_params)
       WHERE key = 'ga_session_id')))         AS sessions

  FROM `project.analytics_PROPERTYID.events_*`
  WHERE
    _TABLE_SUFFIX >= '20240101'
    AND event_name = 'page_view'
  GROUP BY week, page_path
)

SELECT
  page_path,
  COUNT(DISTINCT week)                           AS weeks_with_traffic,
  SUM(sessions)                                   AS total_sessions,
  ROUND(AVG(sessions), 0)                         AS avg_weekly_sessions,
  MIN(sessions)                                   AS min_weekly_sessions,
  MAX(sessions)                                   AS max_weekly_sessions,
  ROUND(SAFE_DIVIDE(MIN(sessions),
    MAX(sessions)), 2)                            AS consistency_score

FROM weekly
WHERE
  page_path IS NOT NULL
  AND REGEXP_CONTAINS(page_path, r'^/blog/')
GROUP BY page_path
HAVING
  total_sessions >= 100
  AND weeks_with_traffic >= 4   -- at least 4 weeks of data
ORDER BY consistency_score DESC, total_sessions DESC
LIMIT 50

Content velocity — new articles gaining momentum

Query 04
New article momentum tracker
For articles published in the last 30 days, tracks whether traffic is growing, flat, or declining in the first weeks after publication. Requires a publish date lookup table — a simple BigQuery table you maintain with page_path and publish_date columns.

Week-over-week growth in the first month after publication is a strong indicator of whether an article is gaining organic traction or relying on the initial promotional push. An article growing consistently after the first-week spike is building search visibility. One that drops sharply after day 3 got its traffic from social and won't compound.

WITH article_weeks AS (
  SELECT
    p.page_path,
    p.publish_date,
    DATE_DIFF(
      DATE_TRUNC(PARSE_DATE('%Y%m%d', e.event_date), WEEK),
      DATE_TRUNC(p.publish_date, WEEK),
      WEEK
    )                                             AS week_num,
    COUNT(DISTINCT CONCAT(e.user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(e.event_params)
       WHERE key = 'ga_session_id')))          AS sessions

  FROM `project.analytics_PROPERTYID.events_*` e
  JOIN `project.content.publish_dates` p
    ON REGEXP_REPLACE(REGEXP_REPLACE(
        REGEXP_EXTRACT(
          (SELECT value.string_value
           FROM UNNEST(e.event_params)
           WHERE key = 'page_location'),
          r'https?://[^/]+(.*)'),
        r'\?.*$', ''), r'/$', '') = p.page_path
  WHERE
    _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d',
      DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY))
    AND e.event_name = 'page_view'
    AND p.publish_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY p.page_path, p.publish_date, week_num
)

SELECT
  page_path,
  publish_date,
  SUM(IF(week_num = 0, sessions, 0))   AS week_1_sessions,
  SUM(IF(week_num = 1, sessions, 0))   AS week_2_sessions,
  SUM(IF(week_num = 2, sessions, 0))   AS week_3_sessions,
  SUM(IF(week_num = 3, sessions, 0))   AS week_4_sessions,
  SUM(sessions)                          AS total_sessions

FROM article_weeks
WHERE week_num BETWEEN 0 AND 3
GROUP BY page_path, publish_date
ORDER BY publish_date DESC

Internal linking performance — which articles drive onward journeys

Query 05
Articles that drive multi-page sessions
An article that consistently leads readers to explore more of the site is more valuable than its session count suggests — it's driving depth of engagement rather than just a single-page visit. This query identifies articles with the highest multi-page session rate.
WITH session_pages AS (
  SELECT
    CONCAT(user_pseudo_id, '.',
      (SELECT value.int_value
       FROM UNNEST(event_params)
       WHERE key = 'ga_session_id'))          AS session_id,

    REGEXP_REPLACE(REGEXP_REPLACE(
      REGEXP_EXTRACT(
        (SELECT value.string_value
         FROM UNNEST(event_params)
         WHERE key = 'page_location'),
        r'https?://[^/]+(.*)'),
      r'\?.*$', ''), r'/$', '')               AS page_path,

    RANK() OVER (
      PARTITION BY CONCAT(user_pseudo_id, '.',
        (SELECT value.int_value
         FROM UNNEST(event_params)
         WHERE key = 'ga_session_id'))
      ORDER BY event_timestamp
    )                                             AS page_rank,

    COUNT(*) OVER (
      PARTITION BY CONCAT(user_pseudo_id, '.',
        (SELECT value.int_value
         FROM UNNEST(event_params)
         WHERE key = 'ga_session_id'))
    )                                             AS pages_in_session

  FROM `project.analytics_PROPERTYID.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
    AND event_name = 'page_view'
)

SELECT
  page_path,
  COUNT(DISTINCT session_id)                    AS entry_sessions,
  COUNTIF(pages_in_session > 1) * 100.0
    / COUNT(DISTINCT session_id)                  AS multi_page_pct,
  ROUND(AVG(pages_in_session), 1)               AS avg_pages_per_session

FROM session_pages
WHERE
  page_rank = 1                          -- first page in session only
  AND REGEXP_CONTAINS(page_path, r'^/blog/')
GROUP BY page_path
HAVING entry_sessions >= 20
ORDER BY multi_page_pct DESC
LIMIT 50

Return reader cohorts — which articles build loyalty

Query 06
Articles that convert first-time visitors into returning readers
Identifies which articles are most effective at bringing new readers back — by finding users whose first session included a specific article, then checking whether they returned within 30 days.
WITH first_sessions AS (
  -- Users whose first session included each article
  SELECT
    user_pseudo_id,
    REGEXP_REPLACE(REGEXP_REPLACE(
      REGEXP_EXTRACT(
        (SELECT value.string_value
         FROM UNNEST(event_params)
         WHERE key = 'page_location'),
        r'https?://[^/]+(.*)'),
      r'\?.*$', ''), r'/$', '')             AS article_path,
    MIN(PARSE_DATE('%Y%m%d', event_date))     AS first_visit_date

  FROM `project.analytics_PROPERTYID.events_*`
  WHERE
    _TABLE_SUFFIX >= '20240101'
    AND event_name = 'page_view'
    AND (SELECT value.int_value
         FROM UNNEST(event_params)
         WHERE key = 'ga_session_number') = 1
  GROUP BY user_pseudo_id, article_path
),

return_visits AS (
  -- Which users returned within 30 days
  SELECT DISTINCT
    user_pseudo_id,
    MIN(PARSE_DATE('%Y%m%d', event_date)) AS return_date

  FROM `project.analytics_PROPERTYID.events_*`
  WHERE
    _TABLE_SUFFIX >= '20240101'
    AND event_name = 'session_start'
    AND (SELECT value.int_value
         FROM UNNEST(event_params)
         WHERE key = 'ga_session_number') > 1
  GROUP BY user_pseudo_id
)

SELECT
  f.article_path,
  COUNT(DISTINCT f.user_pseudo_id)              AS new_readers,
  COUNTIF(
    r.return_date IS NOT NULL
    AND DATE_DIFF(r.return_date,
      f.first_visit_date, DAY) <= 30
  )                                               AS returned_within_30d,
  ROUND(COUNTIF(
    r.return_date IS NOT NULL
    AND DATE_DIFF(r.return_date,
      f.first_visit_date, DAY) <= 30
  ) * 100.0 / COUNT(DISTINCT f.user_pseudo_id), 1)
                                                  AS return_rate_pct

FROM first_sessions f
LEFT JOIN return_visits r USING (user_pseudo_id)
WHERE
  REGEXP_CONTAINS(f.article_path, r'^/blog/')
GROUP BY f.article_path
HAVING new_readers >= 20
ORDER BY return_rate_pct DESC
LIMIT 50
Combine these queries in Looker Studio. Each query above produces a table that can be saved as a BigQuery view and connected to Looker Studio as a separate data source. Use blended data sources in Looker Studio to combine article performance, evergreen score, and return reader rate into a single editorial dashboard. The article path is the join key across all of them.
These queries reflect your GA4 tracking quality. Staging traffic in your sessions data, broken scroll tracking, and missing engagement events all affect the metrics these queries produce. GA4 Health Check identifies tracking issues before you build editorial intelligence on top of them. Run the audit — $79 →
Travis Gunn
Founder of GA4 Health Check. Working with Google Analytics since 2013, with over 250 clients audited across almost every industry vertical. 100% Job Success on Upwork for over a decade.