Standard GA4 reporting for content sites gives you pageviews, sessions, and average engagement time — useful baselines but blunt instruments for editorial decision-making. The questions content teams actually need to answer require combining multiple signals, filtering to specific URL patterns, and applying logic that GA4's interface can't express. BigQuery makes all of this possible.
The queries below assume you have the GA4 BigQuery export running and a basic familiarity with the schema. Replace project.analytics_PROPERTYID with your actual project and property ID throughout. For schema fundamentals, see the schema reference post.
Article performance — the full picture
This is the foundational content query. Engagement time comes from the engagement_time_msec parameter on user_engagement events. Scroll depth requires the scroll event — GA4's enhanced measurement fires it automatically at 90% scroll. Return readers are sessions where ga_session_number > 1.
WITH page_data AS (
SELECT
-- Clean URL: strip domain, query params, trailing slash
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_EXTRACT(
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location'),
r'https?://[^/]+(.*)'
), r'\?.*$', ''),
r'/$', ''
) AS page_path,
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_title') AS page_title,
CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id')) AS session_id,
user_pseudo_id,
event_name,
-- Engagement time in seconds
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'engagement_time_msec') / 1000 AS engagement_sec,
-- Is this a returning reader?
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_number') > 1 AS is_returning
FROM `project.analytics_PROPERTYID.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
AND event_name IN ('page_view', 'user_engagement', 'scroll')
)
SELECT
page_path,
APPROX_TOP_COUNT(page_title, 1)[OFFSET(0)].value AS title,
COUNT(DISTINCT session_id) AS sessions,
COUNT(DISTINCT user_pseudo_id) AS unique_readers,
ROUND(SUM(engagement_sec) / COUNT(DISTINCT session_id), 0)
AS avg_engagement_sec,
ROUND(COUNTIF(event_name = 'scroll') * 100.0
/ COUNT(DISTINCT session_id), 1) AS scroll_90_pct,
ROUND(COUNTIF(is_returning) * 100.0
/ COUNT(DISTINCT session_id), 1) AS return_reader_pct
FROM page_data
WHERE
page_path IS NOT NULL
AND REGEXP_CONTAINS(page_path, r'^/blog/') -- filter to articles
GROUP BY page_path
HAVING sessions >= 10
ORDER BY sessions DESC
LIMIT 100
Referral source quality — who sends real readers
WITH sessions AS (
SELECT
CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id')) AS session_id,
traffic_source.source,
traffic_source.medium,
event_name,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'engagement_time_msec') / 1000 AS engagement_sec
FROM `project.analytics_PROPERTYID.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
AND event_name IN ('session_start', 'user_engagement')
)
SELECT
source,
medium,
COUNT(DISTINCT session_id) AS sessions,
ROUND(SUM(engagement_sec)
/ COUNT(DISTINCT session_id), 0) AS avg_engagement_sec,
ROUND(COUNTIF(event_name = 'user_engagement') * 100.0
/ COUNT(DISTINCT session_id), 1) AS engaged_session_pct
FROM sessions
GROUP BY source, medium
HAVING sessions >= 50
ORDER BY avg_engagement_sec DESC
Content longevity — evergreen vs spike
A consistency score close to 1.0 means the article gets similar traffic every week — classic evergreen behaviour. A score close to 0 means most of the traffic came in one spike week. Combine this with total sessions to find articles that are both high-traffic and consistent.
WITH weekly AS (
SELECT
DATE_TRUNC(
PARSE_DATE('%Y%m%d', event_date), WEEK) AS week,
REGEXP_REPLACE(REGEXP_REPLACE(
REGEXP_EXTRACT(
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location'),
r'https?://[^/]+(.*)'),
r'\?.*$', ''), r'/$', '') AS page_path,
COUNT(DISTINCT CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))) AS sessions
FROM `project.analytics_PROPERTYID.events_*`
WHERE
_TABLE_SUFFIX >= '20240101'
AND event_name = 'page_view'
GROUP BY week, page_path
)
SELECT
page_path,
COUNT(DISTINCT week) AS weeks_with_traffic,
SUM(sessions) AS total_sessions,
ROUND(AVG(sessions), 0) AS avg_weekly_sessions,
MIN(sessions) AS min_weekly_sessions,
MAX(sessions) AS max_weekly_sessions,
ROUND(SAFE_DIVIDE(MIN(sessions),
MAX(sessions)), 2) AS consistency_score
FROM weekly
WHERE
page_path IS NOT NULL
AND REGEXP_CONTAINS(page_path, r'^/blog/')
GROUP BY page_path
HAVING
total_sessions >= 100
AND weeks_with_traffic >= 4 -- at least 4 weeks of data
ORDER BY consistency_score DESC, total_sessions DESC
LIMIT 50
Content velocity — new articles gaining momentum
Week-over-week growth in the first month after publication is a strong indicator of whether an article is gaining organic traction or relying on the initial promotional push. An article growing consistently after the first-week spike is building search visibility. One that drops sharply after day 3 got its traffic from social and won't compound.
WITH article_weeks AS (
SELECT
p.page_path,
p.publish_date,
DATE_DIFF(
DATE_TRUNC(PARSE_DATE('%Y%m%d', e.event_date), WEEK),
DATE_TRUNC(p.publish_date, WEEK),
WEEK
) AS week_num,
COUNT(DISTINCT CONCAT(e.user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(e.event_params)
WHERE key = 'ga_session_id'))) AS sessions
FROM `project.analytics_PROPERTYID.events_*` e
JOIN `project.content.publish_dates` p
ON REGEXP_REPLACE(REGEXP_REPLACE(
REGEXP_EXTRACT(
(SELECT value.string_value
FROM UNNEST(e.event_params)
WHERE key = 'page_location'),
r'https?://[^/]+(.*)'),
r'\?.*$', ''), r'/$', '') = p.page_path
WHERE
_TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY))
AND e.event_name = 'page_view'
AND p.publish_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY p.page_path, p.publish_date, week_num
)
SELECT
page_path,
publish_date,
SUM(IF(week_num = 0, sessions, 0)) AS week_1_sessions,
SUM(IF(week_num = 1, sessions, 0)) AS week_2_sessions,
SUM(IF(week_num = 2, sessions, 0)) AS week_3_sessions,
SUM(IF(week_num = 3, sessions, 0)) AS week_4_sessions,
SUM(sessions) AS total_sessions
FROM article_weeks
WHERE week_num BETWEEN 0 AND 3
GROUP BY page_path, publish_date
ORDER BY publish_date DESC
Internal linking performance — which articles drive onward journeys
WITH session_pages AS (
SELECT
CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id')) AS session_id,
REGEXP_REPLACE(REGEXP_REPLACE(
REGEXP_EXTRACT(
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location'),
r'https?://[^/]+(.*)'),
r'\?.*$', ''), r'/$', '') AS page_path,
RANK() OVER (
PARTITION BY CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))
ORDER BY event_timestamp
) AS page_rank,
COUNT(*) OVER (
PARTITION BY CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))
) AS pages_in_session
FROM `project.analytics_PROPERTYID.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
AND event_name = 'page_view'
)
SELECT
page_path,
COUNT(DISTINCT session_id) AS entry_sessions,
COUNTIF(pages_in_session > 1) * 100.0
/ COUNT(DISTINCT session_id) AS multi_page_pct,
ROUND(AVG(pages_in_session), 1) AS avg_pages_per_session
FROM session_pages
WHERE
page_rank = 1 -- first page in session only
AND REGEXP_CONTAINS(page_path, r'^/blog/')
GROUP BY page_path
HAVING entry_sessions >= 20
ORDER BY multi_page_pct DESC
LIMIT 50
Return reader cohorts — which articles build loyalty
WITH first_sessions AS (
-- Users whose first session included each article
SELECT
user_pseudo_id,
REGEXP_REPLACE(REGEXP_REPLACE(
REGEXP_EXTRACT(
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location'),
r'https?://[^/]+(.*)'),
r'\?.*$', ''), r'/$', '') AS article_path,
MIN(PARSE_DATE('%Y%m%d', event_date)) AS first_visit_date
FROM `project.analytics_PROPERTYID.events_*`
WHERE
_TABLE_SUFFIX >= '20240101'
AND event_name = 'page_view'
AND (SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_number') = 1
GROUP BY user_pseudo_id, article_path
),
return_visits AS (
-- Which users returned within 30 days
SELECT DISTINCT
user_pseudo_id,
MIN(PARSE_DATE('%Y%m%d', event_date)) AS return_date
FROM `project.analytics_PROPERTYID.events_*`
WHERE
_TABLE_SUFFIX >= '20240101'
AND event_name = 'session_start'
AND (SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_number') > 1
GROUP BY user_pseudo_id
)
SELECT
f.article_path,
COUNT(DISTINCT f.user_pseudo_id) AS new_readers,
COUNTIF(
r.return_date IS NOT NULL
AND DATE_DIFF(r.return_date,
f.first_visit_date, DAY) <= 30
) AS returned_within_30d,
ROUND(COUNTIF(
r.return_date IS NOT NULL
AND DATE_DIFF(r.return_date,
f.first_visit_date, DAY) <= 30
) * 100.0 / COUNT(DISTINCT f.user_pseudo_id), 1)
AS return_rate_pct
FROM first_sessions f
LEFT JOIN return_visits r USING (user_pseudo_id)
WHERE
REGEXP_CONTAINS(f.article_path, r'^/blog/')
GROUP BY f.article_path
HAVING new_readers >= 20
ORDER BY return_rate_pct DESC
LIMIT 50
