You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A cohort analysis groups users based on certain traits and compares their behavior across groups and over time.
The most common way of grouping users is by when they first joined. This definition will change for each company and even use case but it's generally taken to be the day or week that a user joined. All users who joined at the same time are grouped together. Again, this can be made more complex if needed.
The simplest way to start comparing cohorts is seeing if they returned to your site/product in the weeks after they joined. You can continue to refine what patterns of behavior you're looking for.
But for this example, we'll do the following:
Assign users a cohort based on the week they created their account
For each cohort determine the % of users that returned for the 10 weeks after joining
The Data
The way your data is structured will greatly determine how you do your analysis. Most commonly, there's a transaction table that has timestamped events such as 'create account' or 'purchase' for each user. This is what we'll use for our example.
In this example I've gone step by step, but to do it all in one query you could try:
WITH cohorts as
(
SELECT
fullVisitorId,
date_trunc(first_value(date) over (partition by fullVisitorId order bydateasc),week) cohort
FROM data
),
activity as
(
SELECTcohorts.cohort,
date_diff(date,cohort,week) weeks_since_joining,
count(distinct data.fullVisitorId) visitors
FROM
cohorts
LEFT JOIN data ONcohorts.fullVisitorId=data.fullVisitorIdWHERE
date_diff(date,cohort,week) between 1and10GROUP BY cohort, weeks_since_joining
),
cohorts_with_weeks as
(
SELECT distinct
cohort,
count(distinct fullVisitorId) cohort_size,
weeks_since_joining
FROM cohorts
CROSS JOIN
UNNEST(generate_array(1,10)) weeks_since_joining
GROUP BY cohort, weeks_since_joining
),
SELECTcohorts_with_weeks.cohort,
cohorts_with_weeks.weeks_since_joining,
cohorts_with_weeks.cohort_size,
coalesce(activity.visitors,0)/cohorts_with_weeks.cohort_size per_active,
FROM
cohorts_with_weeks
LEFT JOIN activity
ONcohorts_with_weeks.cohort=activity.cohortANDcohorts_with_weeks.weeks_since_joining=activity.weeks_since_joining
Data Preview
fullVisitorId
visitNumber
visitId
date
...
channelGrouping
6453870986532490896
1
1473801936
2016-09-13
...
Display
2287637838474850444
5
1473815616
2016-09-13
...
Direct
1. Assign each user a cohort based on the week they first appeared
--cohortsselect
fullVisitorId,
date_trunc(first_value(date) over (partition by fullVisitorId order bydateasc),week) cohort
from data
fullVisitorId
cohort
0002871498069867123
21/08/2016
0020502892485275044
11/09/2016
....
....
2. For each cohort find the number of users that have returned for weeks 1-10
--activityselectcohorts.cohort,
date_diff(date,cohort,week) weeks_since_joining,
count(distinct data.fullVisitorId) visitors
from
cohorts
left join data oncohorts.fullVisitorId=data.fullVisitorIdwhere date_diff(date,cohort,week) between 1and10group by
cohort, weeks_since_joining
cohort
weeks_since_joining
visitors
2016-07-31
2
7
2016-07-31
1
6
2016-07-31
9
1
2016-07-31
5
1
....
....
....
3. Turn that into a percentage of users each week (being sure to include weeks where no users returned)
--cohorts_with_weeksselect distinct
cohort,
count(distinct fullVisitorId) cohort_size,
weeks_since_joining
from cohorts
cross join
unnest(generate_array(1,10)) weeks_since_joining
group by cohort, weeks_since_joining
cohort
cohort_size
weeks_since_joining
2016-08-21
263
1
2016-08-21
263
2
2016-08-21
263
3
2016-08-21
263
4
2016-08-21
263
5
2016-08-21
263
6
2016-08-21
263
7
2016-08-21
263
8
2016-08-21
263
9
2016-08-21
263
10
...
...
...
--cohort_tableSelectcohorts_with_weeks.cohort,
cohorts_with_weeks.cohort_size,
cohorts_with_weeks.weeks_since_joining,
coalesce(activity.visitors,0)/cohorts_with_weeks.cohort_size per_active,
from
cohorts_with_weeks
left join activity oncohorts_with_weeks.cohort=activity.cohortandcohorts_with_weeks.weeks_since_joining=activity.weeks_since_joining