C#慎用ToLower和ToUpper,小心把你的系統給拖垮了
當前位置:點晴教程→知識管理交流
→『 技術文檔交流 』
不知道何時開始,很多程序員喜歡用ToLower,ToUpper去實現忽略大小寫模式的字符串相等性比較,有可能這個習慣是從別的語言引進的,大膽猜測下是JS,為了不引起爭論,我指的JS是技師的意思~ 一:背景1. 講故事在我們一個訂單聚合系統中,每一筆訂單都會標注來源,比如JD,Taobao,Etao,Shopex 等等一些渠道,UI上也提供高級配置輸入自定義的訂單來源,后來客戶反饋輸入xxx查詢不出訂單,這里就拿shopex為例,用戶用小寫的shopex查詢,但系統中標注的是首字母大寫的Shopex,所以自然無法匹配,為了解決這個問題開發小哥就統一轉成大寫做比對,用代碼表示如下: var orderfrom = "shopex".ToUpper(); customerIDList = MemoryOrders.Where(i =>i.OrderFrom.ToUpper()==orderFrom) .Select(i => i.CustomerId).ToList(); 改完后就是這么牛的上線了,乍一看也沒啥問題,結果一查詢明顯感覺比之前速度慢了好幾秒,干脆多點幾下,好咯。。。在監控中發現CPU和memory突高突低,異常波動,這位小哥又在寫bug了,查了下代碼問他為什么這么寫,小哥說在js中就是這么比較的~~~ 2. string.Compare 改造其實在C#中面對忽略大小寫形式的比較是有專門的方法,性能高而且還不費內存,它就是 var orderfrom = "shopex"; customerIDList = MemoryOrders.Where(string.Compare(i.TradeFrom, tradefrom, StringComparison.OrdinalIgnoreCase) == 0) .Select(i => i.CustomerId).ToList(); 這其中的 二:為什么ToLower,ToUpper會有如此大的影響為了方便演示,我找了一篇英文小短文,然后通過查詢某一個單詞來演示ToUpper為啥對cpu和memory以及查詢性能都有如此大的影響,代碼如下: public static void Main(string[] args) { var strList = "Hooray! It''s snowing! It''s time to make a snowman.James runs out. He makes a big pile of snow. He puts a big snowball on top. He adds a scarf and a hat. He adds an orange for the nose. He adds coal for the eyes and buttons.In the evening, James opens the door. What does he see? The snowman is moving! James invites him in. The snowman has never been inside a house. He says hello to the cat. He plays with paper towels.A moment later, the snowman takes James''s hand and goes out.They go up, up, up into the air! They are flying! What a wonderful night!The next morning, James jumps out of bed. He runs to the door.He wants to thank the snowman. But he''s gone.".Split('' ''); var query = "snowman".ToUpper(); for (int i = 0; i < strList.Length; i++) { var str = strList[i].ToUpper(); if (str == query) Console.WriteLine(str); } Console.ReadLine(); } 1. 內存波動探究既然內存有波動,說明內存里進了臟東西,學C#基礎知識的時候應該知道string是不可變的,一旦有修改就會生成新的string,那就是說ToUpper之后會出現新的string,為了用數據佐證,用windbg演示一下。 0:000> !dumpheap -type System.String -stat Statistics: MT Count TotalSize Class Name 00007ff8e7a9a120 1 24 System.Collections.Generic.GenericEqualityComparer`1[[System.String, mscorlib]] 00007ff8e7a99e98 1 80 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]] 00007ff8e7a9a378 1 96 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]][] 00007ff8e7a93200 19 2264 System.String[] 00007ff8e7a959c0 429 17894 System.String Total 451 object 可以看到托管堆上有 !dumpheap -mt 00007ff8e7a959c0 > !DumpObj 000002244282a1f8 0:000> !DumpObj /d 0000017800008010 Name: System.String MethodTable: 00007ff8e7a959c0 EEClass: 00007ff8e7a72ec0 Size: 38(0x26) bytes File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll String: HOUSE. Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7a985a0 4000281 8 System.Int32 1 instance 6 m_stringLength 00007ff8e7a96838 4000282 c System.Char 1 instance 48 m_firstChar 00007ff8e7a959c0 4000286 d8 System.String 0 shared static Empty >> Domain:Value 0000017878943bb0:NotInit << 0:000> !DumpObj /d 0000017800008248 Name: System.String MethodTable: 00007ff8e7a959c0 EEClass: 00007ff8e7a72ec0 Size: 40(0x28) bytes File: C:\WINDOWS\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll String: SNOWMAN Fields: MT Field Offset Type VT Attr Value Name 00007ff8e7a985a0 4000281 8 System.Int32 1 instance 7 m_stringLength 00007ff8e7a96838 4000282 c System.Char 1 instance 53 m_firstChar 00007ff8e7a959c0 4000286 d8 System.String 0 shared static Empty >> Domain:Value 0000017878943bb0:NotInit << 查了兩個全是大寫的“HOUSE”,“SNOWMAN”,再回到我的場景有小百萬訂單,也就會在托管堆上生成小百萬個string,如果再點一次又會生成小百萬個,內存怎么會不突增呢。。。 2.cpu和查詢時間探究現在大家知道了堆上可能有幾百萬個string對象,這些對象的分配和釋放給cpu造成了不小的壓力,本身toUpper之后速度變慢,更慘的是還會造成gc顫抖式觸發,一顫抖所有的thread都會被暫停開啟回收,速度就更慢了。。。 三:string.Compare解析再回過頭來看一下string.Compare為什么這么🐮👃,大家可以通過dnspy查看一下源碼即可,里面有一個核心函數,如下圖: // Token: 0x060004B8 RID: 1208 RVA: 0x00010C48 File Offset: 0x0000EE48 [] private unsafe static int CompareOrdinalIgnoreCaseHelper(string strA, string strB) { int num = Math.Min(strA.Length, strB.Length); fixed (char* ptr = &strA.m_firstChar) { fixed (char* ptr2 = &strB.m_firstChar) { char* ptr3 = ptr; char* ptr4 = ptr2; while (num != 0) { int num2 = (int)(*ptr3); int num3 = (int)(*ptr4); if (num2 - 97 <= 25) { num2 -= 32; } if (num3 - 97 <= 25) { num3 -= 32; } if (num2 != num3) { return num2 - num3; } ptr3++; ptr4++; num--; } return strA.Length - strB.Length; } } } 這段代碼很精妙,巧妙的使用97,將兩個字符串按照大寫模式的ascii碼進行逐一比較,相比在堆上搞一堆東西快捷的多。 然后我修改一下代碼,看看此時堆上如何。。。 public static void Main(string[] args) { ... var query = "snowman"; for (int i = 0; i < strList.Length; i++) { if (string.Compare(strList[i], query, StringComparison.OrdinalIgnoreCase) == 0) { Console.WriteLine(strList[i]); } } Console.ReadLine(); } 0:000> !dumpheap -type System.String -stat Statistics: MT Count TotalSize Class Name 00007ff8e7a9a120 1 24 System.Collections.Generic.GenericEqualityComparer`1[[System.String, mscorlib]] 00007ff8e7a99e98 1 80 System.Collections.Generic.Dictionary`2[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]] 00007ff8e7a9a378 1 96 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Globalization.CultureData, mscorlib]][] 00007ff8e7a93200 19 2264 System.String[] 00007ff8e7a959c0 300 13460 System.String Total 322 objects 從 四: 總結平時我們哪些不好的寫法,在大量數據面前不堪一擊,同時也是一次好的成長機會~ 該文章在 2021/1/30 9:44:09 編輯過 |
關鍵字查詢
相關文章
正在查詢... |