Beginner’s Guide to Working with GA4 Data in BigQuery
2025-01-17
By: Anton Gezelius
Co-Founder & Team Lead
data:image/s3,"s3://crabby-images/e6a19/e6a194417d82bddb69645152f9c6e5c6803bba8a" alt="Feature image"
Introduction
Around 10-15 years ago, having specialized expertise in Google Analytics was a major competitive advantage in digital analytics. The ability to configure goals, create custom events, and understand the mechanics behind dimensions and metrics could make professionals indispensable in many organizations. Simply logging into Google Analytics or adding an event using Google Tag Manager could elevate one’s status significantly.
Today, the landscape of digital analytics has shifted significantly.
Over the past five years, it has become clear that expertise in Google Analytics alone is no longer sufficient to remain competitive in the job market. Many job postings now explicitly require SQL skills, often alongside familiarity with data warehouse solutions like Google BigQuery.
This blog post provides a step-by-step guide for anyone who hasn’t yet started using SQL. The goal is to help build confidence in working with the Google Analytics 4 BigQuery export. It can be challenging to get started without guidance, but this guide will simplify the process.
Before diving into the steps, let’s explore why SQL has become an essential skill.
Why This Increasing Demand for SQL Competence?
There are two primary reasons for the rising demand for SQL skills:
- Increased Market Skill Levels: Companies no longer need help with basic analytics tasks, like determining the percentage of traffic from mobile devices. These tasks are now easily handled by employees across various roles. Instead, companies need solutions for more complex analytical challenges.
- The Evolving Digital Landscape: Modern digital ecosystems span multiple devices and systems. For instance, a streaming service might run campaigns on a website where users create accounts with a free trial, then encourage users to download an app to access the service. Revenue generation often happens after the trial period ends and backend systems process credit card charges. Measuring key metrics like ROAS (Return on Ad Spend) or CLV (Customer Lifetime Value) in such a setup requires navigating multiple interconnected systems.
These complexities were rare 10–15 years ago when a “complex” implementation often involved basic revenue tracking on a checkout page. Today, digital analysts must master advanced measurement techniques, privacy considerations, and working with diverse data sources. SQL may not solve all these challenges, but it often plays a critical role in addressing them.
After all, a digital analyst is a form of data analyst, and data analysts have one native language: SQL.
The Guide
To get started with SQL and Google BigQuery, the following steps are recommended. While several steps can be completed in parallel, they are presented in a logical progression:
1. Create a GCP Account
Google BigQuery is a data warehousing solution within the Google Cloud Platform (GCP). Create an account and start using the free tier. Visit the Google Cloud Console to sign up and explore.
2. Access BigQuery Public Datasets
Google offers access to numerous public datasets, ideal for practicing SQL. Beginners are encouraged to start with simpler datasets rather than diving directly into the Google Analytics 4 (GA4) schema, which is complex.
For example, the austin_crime
dataset contains real-world crime statistics from the Austin Police Department. Exploring this dataset can build confidence in SQL fundamentals. Other available datasets include topics like sports, crime statistics, and COVID-19 epidemiological data.
In the picture below, I query the volume and clearance rates for theft across different police districts in Austin.
data:image/s3,"s3://crabby-images/7a0a7/7a0a7e7cb93a7f01b1c5fdb0186eb2b6d2714313" alt=""
To get started, refer to Google’s official guide on querying public datasets.
In BigQuery, a dataset is a container grouping related tables and objects like views. A table stores data in rows and columns. Think of a dataset as a folder and its tables as files holding the actual data. Understanding this structure is crucial when navigating BigQuery.
3. Learn SQL Basics
As you explore public datasets, study basic SQL concepts. Resources like W3Schools provide an excellent starting point. Focus on understanding the following:
- Intro, Syntax: Foundational knowledge of SQL query structure.
- SELECT / SELECT DISTINCT: Retrieving data and removing duplicates.
- WHERE: Filtering rows based on conditions.
- ORDER BY: Sorting query results.
- AND / OR / NOT: Logical operators for filtering.
- NULL Values: Handling missing or undefined data.
- Aggregate Functions (MIN, MAX, COUNT, SUM, AVG): Summarizing data.
- LIKE / Wildcards: Searching within text columns.
- IN / BETWEEN: Filtering by multiple values or ranges.
- Aliases: Renaming columns or tables for readability.
- Joins (INNER, LEFT, RIGHT, FULL): Combining data across related tables.
- GROUP BY / HAVING: Grouping data for analysis and filtering grouped results.
- CASE (optional): Conditional logic within queries.
- UNION (optional): Combining results from multiple queries.
Once you are comfortable with these basics, it is recommended to explore advanced concepts like subqueries and Common Table Expressions (CTEs).
4. Explore the Google Analytics 4 BigQuery Dataset
With a solid understanding of SQL basics, begin working with GA4 data in BigQuery. Note that GA4’s schema includes nested and repeated fields, requiring additional functions (e.g., UNNEST) to access and analyze data. Beginners should approach this dataset only after gaining experience with simpler datasets.
If you haven’t already set up the integration between GA4 and Google BigQuery, follow this guide.
Useful resources include:
- GA4BigQuery.com by Johan van de Werken
- Google’s library of queries for GA4
5. Elevate with ChatGPT
Tools like ChatGPT are invaluable for debugging and generating SQL code. Examples of effective prompts include:
- “Produce SQL code that formats the field ‘week_day’ as ‘YYYY-MM-DD,’ given current values like ‘Week 1, Tuesday, 2025’ in Google BigQuery.”
- “Identify and fix errors in this SQL code [CODE].”
- “Simplify the following SQL code: [CODE].”
ChatGPT can also serve as a personal trainer, providing tailored exercises for specific datasets.
Moving Forward
After consistent practice, analysts can leverage BigQuery to answer complex analysis questions rather than relying solely on the GA4 interface. Although the GA4 interface is convenient for quick answers, consider challenging yourself to find solutions in BigQuery instead.
Next steps for continued growth include:
- Scheduling queries
- Visualizing data in Looker Studio
- Streamlining workflows with tools like Dataform
Final Thoughts
Starting with SQL can be intimidating, but practicing with public datasets and foundational concepts builds confidence. Avoid the common mistake of diving into complex tracking data without first mastering basic queries.
For teams looking to elevate their analytics capabilities, Ctrl Digital offers full-day courses covering these steps with dedicated code lab sessions to practice and apply skills in real-time.
Wishing you success in mastering SQL!