數(shù)據(jù)分析工作中常用的3類SQL開窗函數(shù)詳解
當(dāng)前位置:點晴教程→知識管理交流
→『 技術(shù)文檔交流 』
在日常工作中,我們經(jīng)常要處理各種數(shù)據(jù)排名、累計求和、環(huán)比等問題。SQL窗口函數(shù)為我們提供了強(qiáng)大的工具來處理這些任務(wù)。 接下來,我們一起看看SQL常見窗口函數(shù)及其應(yīng)用案例吧。 窗口函數(shù)中必須有一個over子句。 01 1.1 函數(shù)說明
1.2 案例說明 這里通過表data_learning.product_order(商品銷量表)、data_learning.product(商品信息表)、data_learning.product_category(商品二級分類信息表)進(jìn)行舉例,兩張表建表語句和示例數(shù)據(jù)插入語句可以查看我之前的文章。 數(shù)據(jù)表示例數(shù)據(jù)分別如下: data_learning.product_order(商品銷量表): data_learning.product(商品信息表): data_learning.product_category(商品二級分類信息表):
問題:請將數(shù)據(jù)集按照商品二級類別(category_id)字段進(jìn)行分組后,根據(jù)上面列舉的窗口函數(shù)并按照sales_volume字段降序返回排名。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,ROW_NUMBER() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS ro_result ,RANK() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS rank_result ,DENSE_RANK() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS dense_rank_result ,NTILE(10) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS ntile_resultFROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product )bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 結(jié)果如下(部分截圖):
上面的案例顯示,童裝類別分組下,商品毛衣針織衫和兒童羽絨服銷量都是1617,開窗函數(shù)row_number的排序結(jié)果是順延,rank的排序結(jié)果是相同值并列、后續(xù)排序序號不連續(xù),dense_rank的排序結(jié)果是相同值并列、后續(xù)排序序號連續(xù)。 開窗函數(shù)NTILE(n) OVER(PARTITION BY 分組列 ORDER BY 排序列)用于將一個結(jié)果集分割成指定數(shù)量的桶(buckets),并分配一個桶號給每個值。它通常用于在分組數(shù)據(jù)中將數(shù)據(jù)等分為多個組。 在上面的案例中,我們對商品分類分組數(shù)據(jù)集分割成10個桶,桶號從1開始。以童裝類別為例,童裝類別共17行數(shù)據(jù),17不能被10整除,數(shù)據(jù)集無法均分成10個桶,所以結(jié)果集中前面7個桶包含2行數(shù)據(jù),后面3個桶包含1行數(shù)據(jù)。
02 2.1 函數(shù)說明
2.2 案例說明 問題:請將數(shù)據(jù)集按照商品二級類別(category_id)字段進(jìn)行分組、sales_volume字段進(jìn)行降序后,根據(jù)上面列舉的窗口函數(shù)對sales_volume進(jìn)行聚合。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS cnt_result ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS sum_result ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS avg_result ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS max_result ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS min_result FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category )con a.category_id = c.category_id; 結(jié)果如下(部分截圖): 從案例查詢結(jié)果看,數(shù)據(jù)集按照category_id進(jìn)行分組后,按照sales_volume降序排序,將每組中的sales_volume依次聚合;
這涉及到窗口函數(shù)的窗口幀。窗口幀(Window Frame)是在SQL中用于定義開窗函數(shù)計算時要考慮的行的范圍,也就是開窗函數(shù)作用的范圍。窗口幀使用在OVER子句中,與PARTITION BY和ORDER BY聯(lián)合使用來分組,對組內(nèi)的數(shù)據(jù)進(jìn)行排序和聚合。語句緊接著寫在ORDER BY之后。 窗口幀包含以下類型:
窗口幀表達(dá)式的語法如下:
具體x、y可取值如下:
若ORDER BY 后未指定框架,那么默認(rèn)框架將采用 range unbounded preceding and current row,表示從開窗后的第一行到當(dāng)前行。 若分組后不加ORDER BY 或者在ORDER BY 之后加上語句rows between unbounded preceding and unbounded following,也就是對分組中所有的數(shù)據(jù)進(jìn)行聚合運(yùn)算。具體示例和結(jié)果如下: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS cnt_result ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id) AS cnt_result1 ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS cnt_result2 ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS sum_result ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ) AS sum_result1 ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS sum_result2 ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS avg_result ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id) AS avg_result1 ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS avg_result2 ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS max_result ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id) AS max_result1 ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS max_result2 ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS min_result ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id) AS min_result1 ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS min_result2FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN (SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 運(yùn)行結(jié)果如下(部分截圖): 可以看到,不加ORDER BY的字段如sum_result1和 ORDER BY 后+rows between unbounded preceding and unbounded following的字段sum_result2結(jié)果一樣。 03 3.1 函數(shù)說明
3.2 案例說明 問題:請將數(shù)據(jù)集按照商品二級類別(category_id)字段進(jìn)行分組、并利用其他常見窗口函數(shù)對sales_volume字段進(jìn)行處理(主要是為了理解上述函數(shù)的作用效果)。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,LAG(a.sales_volume,1) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lag_result ,LEAD(a.sales_volume,1) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lead_result ,FIRST_VALUE(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS firstvalue_result ,LAST_VALUE(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lastvalue_result FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 結(jié)果如下(部分截圖): 其他常見窗口函數(shù)同聚合類型的窗口函數(shù)一樣,也可以結(jié)合窗口幀實現(xiàn)不同的數(shù)據(jù)統(tǒng)計效果。 04 簡單總結(jié)下今天分享的內(nèi)容。 我們詳細(xì)介紹了SQL中的三類窗口函數(shù),并且在每個函數(shù)下使用了相關(guān)的示例演示函數(shù)的使用效果。同時,介紹了窗口幀的使用,讓我們可以在SQL查詢中進(jìn)行更靈活和精確的數(shù)據(jù)分析和聚合。 該文章在 2024/1/13 17:33:11 編輯過 |
關(guān)鍵字查詢
相關(guān)文章
正在查詢... |