Google recently fixed the missing event parameters issue with the GA4 session_start and first_visit events. Starting from November 2, 2023, the events contain the same parameters as the first event of the session that triggered these events.
While this update fixes one of the problems by making the data more consistent, there are still a lot of issues related to these two events. At best, they are just a bit off, while in worst cases, they are complete junk.
Here’s why:
number of session_start events ≠ number of sessions
number of first_visit events = number of new users in GA4…
… but the number is actually measured incorrectly.
Also, as a bonus, there seem to be some serious issues with the measurement of new vs. returning users in GA4.
Post updates
- 2024-04-22: Corrections to how the first_visit event works, new debug query for verifying issues. Thanks Giovani Ortolani Barbosa and Leonardo Lourenço Crespilho for the comments!
Session_start
GA4 generates the session_start event whenever an event has the _ss parameter. Essentially, it’s a duplicate of the session’s first event with a different event name.
GA4 itself doesn’t use the session_start events anymore for its sessions metric. Instead, the number of sessions in GA4 is an estimate of the number of unique session ids.
The session_start event is based entirely on the client-side script’s ability to include the _ss flag in the event. This evaluation, which is done on the client side, is not flawless. Sometimes, the script incorrectly triggers multiple session_starts per one session.
As explained by Simo in Measure Slack:
In most cases, the number of session_start events should be close to the number of sessions. However, there can be some discrepancy.
By digging deeper using BigQuery data, we see that some sessions include multiple session_start events, while some don’t have any session_start events at all.
I got the above result with a query that didn’t include any time range filter, thus avoiding potential issues if the sessions overlap days.
You can test this with your own GA4 property’s data by using the following query.
select
concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
countif(event_name = 'session_start') as session_start_events
from
`<project>.<dataset>.events_*`
group by
session_id
Missing session_start events are especially common with sub-properties, even after the November 2, 2023 update.
Sub-properties allow you to get a filtered view of a larger entity. It can often be the case that the session’s first event has already happened before viewing the content included in the sub-property. Because the evaluation is done client-side, not in the GA4 property, the session_start event will be missing.
The above Exploration report uses a session segment with the following criteria.
A proper session_start event would make it easy, for example, to access the session’s first traffic source in BigQuery (without further attribution) by using a simple where clause. However, as the event is so unreliable, it’s still best to just get this information from the first event of the session.
First_visit
GA4 uses the first_visit events for measuring new users. As with the session_start event, this event is also based on a flag added by the client-side tracking code. This flag is called _fv.
But does the new and returning users measurement work?
You would think that adding up new users + returning users would give you roughly the same number as total users. After all, shouldn’t each user belong in one or the other bucket? Of course, a new user could later turn into a returning user. So, depending on the logic used in the calculation, there could be a case where the same user is counted twice. However, at least all users should either be new or returning.
Well, that is not the case in GA4. In worst cases, most of the users are neither new nor returning.
The reason behind this discrepancy is that GA4 uses an entirely different logic for the returning users metric compared to the new users metric. In GA4, a returning user is one who had at least one preceding session before the current session.
The new users metric and the first_visit event, on the other hand, are based on a simplistic cookie value check done on the client side. GA4 checks for the existence of the client id (_ga) cookie and the _ga<Measurement Id> cookie. If one of the cookies doesn’t exist, the tracking script will add the _fv flag in the event.
The evaluation falls short with sub-properties, which are based on a filtered set of events instead of having their own stream. However, there can also be issues in regular properties.
The below query checks if each user_pseudo_id has logged at least one first_visit event.
with event_data as (
select
user_pseudo_id,
max(
if(event_name = 'first_visit', true, false)
) as user_has_first_visit
from
`<table>.<dataset>.events_*`
group by
1
)
select
user_has_first_visit,
count(distinct user_pseudo_id) as users
from
event_data
group by
1
Sometimes, the results can look like this:
I don’t know what exactly is behind these issues. However, they seem to occur mainly with properties that share the same top-level domain as other GA-tracked sites and GA4 properties.
Below are the results of the same query using my blog’s GA4 data.
So, we can conclude that the number of new users is sometimes too low. What about the returning users? Looking at the data, it doesn’t quite seem to work as documented.
GA4 automatically tracks the user’s session number. The session number dimension is unavailable in the UI, but we can access it in BigQuery. The evaluation should be as simple as checking if the ga_session_number parameter is greater than one.
select
count(
distinct
if(
(select value.int_value from unnest(event_params) where key = 'ga_session_number') > 1,
user_pseudo_id,
null
)
) as returning_users
from
`<project>.<dataset>.events_*`
Let’s calculate the same number as shown earlier using BigQuery.
This GA4 property is configured to use the device-based reporting identity. Regardless of that, the numbers are not even close, even though the logic should be the same.
Doing the same comparison on a property that doesn’t share the same domain with another property gives me a much smaller but still very notable difference.
These findings lead me to believe the returning users metric doesn’t work as documented.
Interestingly, the SQL query utilizing the ga_session_number parameter gives results very close to the new and returning user counts in Universal Analytics. Based on that, this method seems like the most accurate way to get this data.
Final thoughts
Because of the above issues, I’ve been avoiding using the session_start and first_visit events for any analysis. Fortunately, the BigQuery export allows more accurate ways to get the session’s first event or the user’s returning vs. new status than using these two events.
Both events rely on logic that happens on the client side. That is a weird design because there are cases, as described in this post, where this method fails miserably.
Finally, when it comes to the new and returning users metrics in GA4, it’s almost as if these two were developed by two isolated teams. For these two to be consistent, shouldn’t they at least follow the same logic?
This post is a collection of issues related to these two events I’ve encountered while working with GA4. Please let me know if I’ve missed something!
Hi Taneli. How are you?
I think the following cookie/first_visit “problem” doesn’t exist:
> Multiple properties setup for sites that are under the same parent domain
The first_visit is sent not when the _ga cookie gets created, but the _ga_MEASUREMENT-ID one is created. So, it does not matter if multiple properties run on the same domain. It make sense?
Regards.
Hi Leonardo,
Yes, you are right. It’s also based on the _ga_MEASUREMENT-ID. If you already have the _ga cookie, but the _ga_MEASUREMENT-ID cookie is missing, then the param gets added. But also, if the _ga_MEASUREMENT-ID param is there but the _ga cookie is missing for some reason, then the _fv param is also added.
I think my tests for that case were a bit too simple.
I actually found this out a while a go but haven’t managed to update the blog yet. Will do that next. Thanks!
Hi Taneli how are you doing?
First of all thank you for the article. A super interesting topic that I havent found anywhere apart from here.
I actually got to this article with a different problem but it also is related to the “session_start” event.
Since I implemented our cookie-banner (via GA4) the “session_start” event doesnt fire anymore in GA4. After some research I found out that the “session_start” event is responsible for session attribution (utm parameter and so on) which is the reason I suddently have a high amount of unassigned traffic in my GA4.
Do you have any idea why this is happening? page_view events and the rest work perfectly and both the GA4 tag and the cookie-banner-tag are implemented with the standart triggers (GA4 is using the trigger “all pages” and our cookie-banner-tag is using “Consent initialization – all pages”
Thanks so much for your help!