SQL for Active Cohorts

October 24th, 2018


Typical query for tracing out what percent of organizations are still active (members are still using the product) for X day. Key here is the generate_series, which generates separate rows for each value, with all other columns duplicated. This query is written for Postgres, but should be do-able in most databases.

Assumes you have an organizations, users, organization_members and events table, and any event with a user_id is an “active” event. Find and replace ‘week’ with whatever interval you want.

-- first 4 weeks
with org_weeks as (
    select id, created, 
        generate_series(created,
                        least(now()::timestamp,
                              created + interval '4 weeks'), 
                        interval '1 week') as cohort_date 
    from organizations 
    where created = :daterange
),
org_events as (
    select distinct org_id, created
    from events e
    join organization_members om on e.user_id=om.user_id
),
active_weeks as (
    select ow.id, 
        extract(epoch from ow.cohort_date - ow.created) / 3600 / 24 / 7 as week, 
        count(oe.org_id) as active
    from org_weeks ow
    left join org_events oe
    on ow.id=oe.org_id 
    and oe.created between ow.cohort_date and ow.cohort_date + interval '1 week'
    group by 1, 2)
select week,
    sum(case when active > 0 then 1 end) as actives,
    count(1) as total,
    sum(case when active > 0 then 1 end)::float / count(1) as percent
from active_weeks
group by 1
;