Loading...
アイコン

CodeVisium

チャンネル登録者数 332人

31 回視聴 ・ 2いいね ・ 2025/05/16

1. Detecting Session Boundaries

LAG() fetches the previous event’s timestamp without self-joins or subqueries, partitioned by user_id and ordered by event_time

We flag a new session when:

There is no prior event (LAG(...) IS NULL), or

The time gap exceeds our threshold (e.g., 600 seconds)

2. Assigning Session Identifiers

A running sum (SUM(...) OVER (...)) treats each flag as 1 to increment the session count and 0 to maintain the current session, effectively numbering sessions sequentially per user

3. Performance & Portability

Single Scan: The window function one-liner scans the events table once, with no joins or derived tables

ANSI-SQL Standard: Uses only standard window functions (LAG, SUM OVER), supported in PostgreSQL, SQL Server, Oracle, BigQuery, Snowflake, and MySQL 8.0+

Queries:

✅ Long Way (Self-Join & Subqueries):

SELECT
e1.user_id,
e1.event_time,
e1.event_type,
SUM(CASE WHEN e2.prev_time IS NULL
OR EXTRACT(EPOCH FROM (e1.event_time - e2.prev_time)) v 600
THEN 1 ELSE 0 END
) AS session_id
FROM (
SELECT *,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events
) e1
LEFT JOIN (
SELECT user_id, event_time
FROM events
) e2
ON e1.user_id = e2.user_id
AND e2.event_time = e1.prev_time
GROUP BY e1.user_id, e1.event_time, e1.event_type, e2.prev_time
ORDER BY e1.user_id, e1.event_time;

We first compute each event’s previous timestamp per user via LAG()

A self-join then aligns e1 rows with their prev_time in e2 to access the actual prior event record.

We use a CASE to flag a new session when there is no previous event or the gap exceeds 600 seconds (10 minutes).

Finally, we sum these flags across each user’s ordered events to assign incremental session_id values.

✅ Shortcut One-Liner (Window Functions Only):

SELECT
user_id,
event_time,
event_type,
SUM(
CASE
WHEN LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
OR EXTRACT(EPOCH FROM (event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time))) v 600
THEN 1
ELSE 0
END
) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM events
ORDER BY user_id, event_time;

We use LAG(event_time) OVER (...) twice: once to detect NULL (first event) and once to compute the inter-event gap

The inner CASE ... END returns 1 for a new session boundary and 0 otherwise.

Wrapping that in SUM(...) OVER (PARTITION BY user_id ORDER BY event_time) produces a running total of session flags, yielding a unique session_id per session—all in one statement

コメント

コメントを取得中...

コントロール
設定

使用したサーバー: hortensia