8 生产环境的性能难题
小于 1 分钟
数据倾斜
低基数 Group By + 高基数去重
两阶段去重:
SELECT day, COUNT(DISTINCT user_id)
FROM T
GROUP BY day
自动改写成
SELECT day, SUM(cnt)
FROM (
SELECT day, COUNT(DISTINCT user_id) as cnt
FROM T
GROUP BY day, MOD(HASH_CODE(user_id), 1024)
)
GROUP BY day
慢节点
dynamic work rebalancing
P99 时延
资源隔离
查询,导入,Compaction, Schema Change, 统计信息收集,数据均衡 等多种任务资源相互影响