CodeVisium
チャンネル登録者数 333人
33 回視聴 ・ 2いいね ・ 2025/05/16
1. Detecting Session Boundaries
LAG() fetches the previous event’s timestamp without self-joins or subqueries, partitioned by user_id and ordered by event_time
We flag a new session when:
There is no prior event (LAG(...) IS NULL), or
The time gap exceeds our threshold (e.g., 600 seconds)
2. Assigning Session Identifiers
A running sum (SUM(...) OVER (...)) treats each flag as 1 to increment the session count and 0 to maintain the current session, effectively numbering sessions sequentially per user
3. Performance & Portability
Single Scan: The window function one-liner scans the events table once, with no joins or derived tables
ANSI-SQL Standard: Uses only standard window functions (LAG, SUM OVER), supported in PostgreSQL, SQL Server, Oracle, BigQuery, Snowflake, and MySQL 8.0+
Queries:
✅ Long Way (Self-Join & Subqueries):
SELECT
e1.user_id,
e1.event_time,
e1.event_type,
SUM(CASE WHEN e2.prev_time IS NULL
OR EXTRACT(EPOCH FROM (e1.event_time - e2.prev_time)) v 600
THEN 1 ELSE 0 END
) AS session_id
FROM (
SELECT *,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events
) e1
LEFT JOIN (
SELECT user_id, event_time
FROM events
) e2
ON e1.user_id = e2.user_id
AND e2.event_time = e1.prev_time
GROUP BY e1.user_id, e1.event_time, e1.event_type, e2.prev_time
ORDER BY e1.user_id, e1.event_time;
We first compute each event’s previous timestamp per user via LAG()
A self-join then aligns e1 rows with their prev_time in e2 to access the actual prior event record.
We use a CASE to flag a new session when there is no previous event or the gap exceeds 600 seconds (10 minutes).
Finally, we sum these flags across each user’s ordered events to assign incremental session_id values.
✅ Shortcut One-Liner (Window Functions Only):
SELECT
user_id,
event_time,
event_type,
SUM(
CASE
WHEN LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) IS NULL
OR EXTRACT(EPOCH FROM (event_time - LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time))) v 600
THEN 1
ELSE 0
END
) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM events
ORDER BY user_id, event_time;
We use LAG(event_time) OVER (...) twice: once to detect NULL (first event) and once to compute the inter-event gap
The inner CASE ... END returns 1 for a new session boundary and 0 otherwise.
Wrapping that in SUM(...) OVER (PARTITION BY user_id ORDER BY event_time) produces a running total of session flags, yielding a unique session_id per session—all in one statement
コメント
再生方法の変更
動画のデフォルトの再生方法を設定できます。埋め込みで見れるなら埋め込みで見た方が良いですよ。
現在の再生方法: education
コメントを取得中...