GA4 data quality is not a single problem. It is a category of problems — each with a different cause, a different signature in your reports, and a different fix. A property can have excellent event tracking and completely broken attribution. It can have clean UTMs and staging traffic contaminating every session count. It can pass a surface-level review and be quietly losing historical data every single day.
This guide covers every major category of GA4 data quality failure we encounter across property audits. Use it as a reference when something looks wrong, as a checklist when onboarding a new property, or as the foundation for a data quality programme across your analytics setup.
| Category | What it affects | Severity |
|---|---|---|
| Data retention | Historical analysis, year-over-year comparisons | Critical |
| Data contamination | Session counts, conversion rates, all metrics | Critical |
| Attribution accuracy | Channel performance, campaign ROI, budget decisions | Critical |
| Event tracking integrity | Conversions, funnels, Smart Bidding signals | Critical |
| Privacy and compliance | EU conversion data, regulatory exposure | High |
| Property configuration | Report accuracy, benchmarking, integrations | High |
| Sampling and thresholding | Report completeness, segment accuracy | Medium |
Category 1: Data Retention
GA4 ships with a default data retention period of 2 months. This means all user-level and event-level data — the data that powers Explorations, custom funnels, path analysis, and cohort reports — is permanently deleted on a rolling 60-day cycle. There is no warning. There is no recovery option.
The impact is most visible when you try to do year-over-year analysis, compare a current campaign against one from six months ago, or understand how a customer cohort from Q1 behaves by Q3. With 2-month retention, none of that is possible. The data simply does not exist anymore.
How to detect it
Go to Admin → Data Settings → Data Retention. If the Event data retention field reads "2 months," your property is actively deleting data right now. This takes 10 seconds to check.
How to fix it
Change Event data retention to 14 months — the maximum available in GA4 without BigQuery. Enable "Reset user data on new activity" to extend the retention window for returning users. For permanent retention beyond 14 months, set up a BigQuery export under Admin → Product Links → BigQuery Links.
Category 2: Data Contamination
Staging and development traffic
GA4 fires wherever your measurement ID is deployed — including staging environments, development branches, and localhost. Without a hostname filter, every developer browsing the site, every QA test run, and every automated pipeline that loads a page registers as a real user session in your production property.
The effect on your metrics is systematic: session counts are inflated, conversions from test sessions are rare, so conversion rate is suppressed. Engagement rate, pages per session, and average session duration are all affected. Properties with active development teams can have 10–25% of sessions originating from non-production environments.
To check: go to Reports → Tech → Tech Overview, change the primary dimension to Hostname, and look for anything that isn't your production domain. To fix: add non-production domains to your referral exclusion list, or prevent the GA4 tag from firing on non-production hostnames via GTM. Full guide to removing staging traffic →
Bot and spam traffic
GA4 has built-in bot filtering that excludes known bots and spiders automatically. But sophisticated bots that execute JavaScript can still pass through and generate events. Signs of bot traffic include unusual spikes in sessions with very low engagement rates, implausibly high pages-per-session, and traffic from unexpected geographic locations at unusual hours. GA4's bot filtering cannot be turned off, but its coverage is not comprehensive. For properties with significant bot traffic concerns, additional filtering via server-side tagging provides more control.
Internal traffic
Your own team browsing the site counts as real user sessions unless explicitly filtered. This is particularly impactful for smaller sites where internal team traffic represents a meaningful percentage of total sessions. Define an internal traffic rule in Admin → Data Streams → Define internal traffic using your office IP address or IP range, then create a data filter in Admin → Data Settings → Data Filters to exclude it from reports.
Category 3: Attribution Accuracy
UTM casing inconsistency
GA4's channel attribution is case-sensitive. utm_medium=email and utm_medium=Email are treated as entirely different values and assigned to different channel groups. In practice, a single email channel often appears fragmented across multiple rows in acquisition reports — with sessions falling into Unassigned rather than Email because the medium value doesn't match GA4's channel grouping rules.
The fix requires two things: standardising UTM conventions going forward by enforcing lowercase and defining a controlled vocabulary for medium values, and creating custom channel grouping rules in GA4 to catch historical variants that can't be re-tagged. Full guide to UTM inconsistency →
Payment processor referral hijacking
When a user leaves your site to complete payment on PayPal, Stripe, or Klarna and returns to your confirmation page, GA4 starts a new session attributed to the payment processor's domain. The purchase event fires in this new session — meaning the payment processor receives credit for a conversion that your actual marketing channels drove. The result is visible in your acquisition report: paypal.com or checkout.stripe.com appearing as a top revenue source while every real marketing channel looks underperforming. To fix: add payment processor domains to your referral exclusion list. Full guide to payment processor attribution →
Missing cross-domain tracking
If your purchase journey spans multiple domains — main site to checkout subdomain, or site to a third-party checkout platform — GA4 will create a new session at every domain boundary unless cross-domain tracking is explicitly configured. This fragments the user journey and breaks attribution for any conversion that involves a domain transition. Configure cross-domain tracking in Admin → Data Streams → Configure tag settings → Configure your domains, and include every domain involved in the user journey.
UTM-tagged internal links
UTM parameters on internal links reset the session source. If your navigation, banners, or internal promotions carry UTM parameters, every click on those links creates a new session attributed to whatever the UTM says — overwriting the original acquisition source. Users who arrived from paid search and then clicked a UTM-tagged homepage banner now appear to have come from "homepage_banner" rather than Google Ads. Audit all internal links and remove UTM parameters from any link that stays within your own domain.
Category 4: Event Tracking Integrity
Zombie conversion events
A zombie conversion event is an event marked as a conversion in GA4 that hasn't actually fired in weeks or months. It appears in your conversion list, contributes zero to your conversion count, and Smart Bidding treats it as a valid signal — even though it represents nothing real. Zombie conversions are usually caused by site changes that broke a tag, event names that were renamed without updating GA4, or events that were added and never properly implemented.
To find them: go to Admin → Events and check the last-fired date for every event marked as a conversion. Any conversion event that hasn't fired in 14 days on an active site is a candidate for investigation. Remove the conversion flag from any event that no longer represents a real business outcome.
Duplicate event firing
When an event fires twice for a single user action — typically a purchase event firing on both a GTM trigger and a hardcoded tag — your conversion counts and revenue figures are doubled. This makes performance look better than it is, which can lead to prematurely reducing budget on campaigns that are actually working. Signs of duplicate firing: GA4 revenue is significantly higher than your e-commerce platform's order revenue for the same period. Diagnose using GA4 DebugView while completing a test conversion — each user action should produce exactly one event.
Missing e-commerce funnel events
GA4's e-commerce funnel requires a specific sequence of events: view_item, add_to_cart, view_cart, begin_checkout, add_shipping_info, add_payment_info, purchase. Each missing event is a funnel step you cannot see. The most commonly absent is add_payment_info — without it, you cannot determine whether checkout abandonment happens before or after payment details are entered. These require completely different solutions. Full guide to missing checkout events →
Purchase event parameter errors
The purchase event requires specific parameters: transaction_id, value, currency, and an items array. Missing or malformed parameters lead to revenue discrepancies, product-level reporting gaps, and incorrect conversion values being sent to Google Ads. Validate purchase event parameters in DebugView by inspecting the full event payload during a real or test purchase.
Event naming convention violations
GA4 recommends snake_case event names. Inconsistent naming — mixing camelCase, spaces, hyphens, and uppercase — creates fragmented event data where the same user action appears as multiple distinct events. formSubmit, form_submit, and Form Submit are three different events in GA4 even if they represent the same thing. Standardise event naming across your implementation and use GA4's event renaming feature to consolidate historical variants where possible.
Category 5: Privacy and Compliance
Consent Mode v2 not implemented
Consent Mode v2 has been required for EEA traffic since March 2024. Without it, GA4 cannot collect any data from users who decline cookie consent — and cannot model their behaviour. For sites with significant EU traffic, this means 30–40% of conversion data is simply absent from reports. Smart Bidding for EU campaigns optimises against an incomplete signal, directly degrading performance.
Consent Mode v2 requires four consent signals: ad_storage, analytics_storage, ad_user_data, and ad_personalization. Implement via a Google-certified CMP using Advanced mode to enable conversion modelling. Verify in Admin → Data Display → Consent Settings. Full guide to Consent Mode v2 →
PII in event data
Google's terms of service prohibit sending personally identifiable information to GA4. Email addresses in page URLs (common in email confirmation flows), names or phone numbers in form field parameters, and traceable user IDs all constitute PII violations. Beyond the contractual issue, PII in GA4 creates regulatory exposure under GDPR. Scan page URL data in Explorations for patterns resembling email addresses and review all custom event parameters for any that capture user-entered data.
Data redaction settings
GA4 anonymises IP addresses by default. However, the data redaction settings under Admin → Data Streams → More Tagging Settings → Redact data deserve a review. Confirm that URL query parameter redaction is appropriately configured for any parameters that might carry user-identifiable information — particularly on sites with user-specific URLs or confirmation pages that embed order or account details.
Category 6: Property Configuration
Wrong timezone
GA4's timezone setting determines where day boundaries fall in your reports. A property set to UTC for a UK-based business means every daily report is offset by one to two hours. Day-over-day analysis, daypart reporting, and any time-based comparison is consistently wrong — not catastrophically, but in a way that compounds across every date-sensitive decision. Check Admin → Property Settings and confirm the timezone matches the client's primary operating market.
Wrong attribution model
GA4 defaults to data-driven attribution for properties with sufficient conversion volume, and last-click for those that don't qualify. The attribution model affects how conversion credit is distributed across touchpoints. Check Admin → Attribution Settings and confirm the model reflects how your business evaluates marketing performance. Note: changing the attribution model retroactively updates historical data, which can be disorienting mid-campaign analysis — document the current setting before changing it.
Custom dimension quota approaching limit
GA4 allows 50 custom dimensions and 50 custom metrics per property. Once the limit is reached, no new dimensions can be added without deleting existing ones — and deleting a custom dimension removes it from all historical reports. Audit your custom dimension list in Admin → Custom Definitions and remove any that are unused, redundant, or were created experimentally and never built upon. Do this before you hit the limit, not in a crisis after you do.
Duplicate data streams
Properties sometimes end up with multiple web data streams — the result of a migration, an agency adding a new stream without removing the old one, or a redesign that created a parallel tracking setup. Multiple active streams can result in duplicate event collection if both measurement IDs fire on the same pages. Check Admin → Data Streams and verify there is exactly one active web stream with recent data.
Category 7: Sampling and Thresholding
Exploration sampling
GA4 Explorations sample data when queries are complex or date ranges are long. Sampled reports show an orange icon in the top right corner of the exploration, with the sample percentage displayed. A 5% sample means 95% of your data was not used to generate the report. For high-stakes analysis, reduce the date range, simplify the exploration, or export raw data to BigQuery where sampling does not apply.
Reporting thresholding
When Google Signals is enabled and a dimension combination represents fewer than a threshold number of users, GA4 removes those rows from reports to protect user privacy. This appears as an orange exclamation triangle in the report header. Thresholding typically affects reports with granular demographic breakdowns. If thresholding is obscuring data you need, consider switching to Device-based reporting identity in Admin → Reporting Identity, or disable Google Signals for reporting purposes — though this affects cross-device attribution.
Cardinality limits
GA4 groups low-frequency dimension values into an "(other)" row when the number of unique values exceeds the cardinality limit. This is common with high-cardinality dimensions like page URLs with query parameters, campaign names with many variants, or custom dimensions with large numbers of unique values. If "(other)" represents a significant percentage of your data for an important dimension, investigate what's driving the cardinality. Strip unnecessary URL parameters at the stream level or consolidate dimension values at source.
How to audit your property across all seven categories
A thorough manual audit working through each category above takes an experienced analyst four to eight hours — checking Admin settings, analysing acquisition and hostname reports, validating events in DebugView, and cross-referencing GA4 data against source-of-truth systems like your e-commerce platform or CRM.
GA4 Health Check automates this entire process. Our 47-point audit covers all seven categories, runs in parallel across 7 modules, and delivers a scored PDF report in under 60 seconds. Every finding is prioritised by severity and includes specific remediation steps — so you know not just what's broken, but exactly how to fix it.
- Data retention setting and BigQuery export status
- Hostname verification for staging and development traffic
- UTM medium consistency analysis across your date range
- Payment processor referral exclusion check
- Conversion event activity verification — zombie event detection
- E-commerce funnel completeness across all 7 standard events
- Consent Mode v2 signal detection and status
- PII scanning in page URLs and event parameters
- Custom dimension quota check
- Duplicate data stream detection
- And 37 additional checks across all seven categories
