Redis監(jiān)控工具,命令和調(diào)優(yōu)
1.圖形化監(jiān)控
因為要對Redis做性能測試,發(fā)現(xiàn)了GitHub上有個python寫的RedisLive監(jiān)控工具評價不錯。結(jié)果鼓搗了半天,最后發(fā)現(xiàn)其主頁中引用了Google的jsapi腳本,必須在線連接谷歌的服務,Stackoverflow上說把js腳本下載到本地也沒法解決問題,坑爹!正要放棄時發(fā)現(xiàn)了一個從RedisLive fork出去的項目redis-monitor,應該是國人改的吧,去掉了對谷歌jsapi的依賴,并完善了多Redis實例的管理,最終終于看到了久違的曲線圖。
首先要保證安裝了python。之后下載下列python包安裝??梢允謩酉螺dtar.gz解壓后執(zhí)行python setup.py install逐一安裝,或直接用pip下載:
- tornado:一個python的web框架
- redis.py:python的redis客戶端
- python-dateutil
- backports.ssl_match_hostname
- argparse
- setuptools
- six
之后從GitHub上下載解壓redis-monitor-master,修改src/redis_live.conf。必須配置一個單獨的Redis實例存儲監(jiān)控數(shù)據(jù),同時可以配置多個要監(jiān)控的Redis實例。之后啟動redis-monitor有些麻煩,需要啟動兩個前臺進程和兩個后臺進程:
#in src/script/redis-monitor.sh add redis-monitor as a startup service
#start web with port 8888
$ python redis_live.py
# start info collector
$ python redis_monitor.py
#start daemon
$ python redis_live_daemon.py
$ python redis_monitor_daemon.py
2.命令行監(jiān)控
前面可以看到,雖然圖形化監(jiān)控Redis比較美觀、直接,但是安裝起來比較麻煩。如果只是想簡單看一下Redis的負載情況的話,完全可以用它提供的一些命令來完成。
2.1 吞吐量
Redis提供的INFO命令不僅能夠查看實時的吞吐量(ops/sec),還能看到一些有用的運行時信息。下面用grep過濾出一些比較重要的實時信息,比如已連接的和在阻塞的客戶端、已用內(nèi)存、拒絕連接、實時的tps和數(shù)據(jù)流量等:
[root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1 info | grep -e "connected_clients" -e "blocked_clients" -e "used_memory_human" -e "used_memory_peak_human" -e "rejected_connections" -e "evicted_keys" -e "instantaneous"
connected_clients:1
blocked_clients:0
used_memory_human:799.66K
used_memory_peak_human:852.35K
instantaneous_ops_per_sec:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
evicted_keys:0
2.2 延遲
2.2.1 客戶端PING
從客戶端可以監(jiān)控Redis的延遲,利用Redis提供的PING命令,不斷PING服務端,記錄服務端響應PONG的時間。下面開兩個終端,一個監(jiān)控延遲,一個監(jiān)視服務端收到的命令:
[root@vm redis-3.0.3]# src/redis-cli --latency -h 127.0.0.1
min: 0, max: 1, avg: 0.08
[root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1
127.0.0.1:6379> monitor
OK
1439361594.867170 [0 127.0.0.1:59737] "PING"
1439361594.877413 [0 127.0.0.1:59737] "PING"
1439361594.887643 [0 127.0.0.1:59737] "PING"
1439361594.897858 [0 127.0.0.1:59737] "PING"
1439361594.908063 [0 127.0.0.1:59737] "PING"
1439361594.918277 [0 127.0.0.1:59737] "PING"
1439361594.928469 [0 127.0.0.1:59737] "PING"
1439361594.938693 [0 127.0.0.1:59737] "PING"
1439361594.948899 [0 127.0.0.1:59737] "PING"
1439361594.959110 [0 127.0.0.1:59737] "PING"
如果我們故意用DEBUG命令制造延遲,就能看到一些輸出上的變化:
[root@vm redis-3.0.3]# src/redis-cli --latency -h 127.0.0.1
min: 0, max: 1995, avg: 1.60 (2361 samples)
[root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1
127.0.0.1:6379> debug sleep 1
OK
(1.00s)
127.0.0.1:6379> debug sleep .15
OK
127.0.0.1:6379> debug sleep .5
OK
(0.50s)
127.0.0.1:6379> debug sleep 2
OK
(2.00s)
2.2.2 服務端內(nèi)部機制
服務端內(nèi)部的延遲監(jiān)控稍微麻煩一些,因為延遲記錄的默認閾值是0。盡管空間和時間耗費很小,Redis為了高性能還是默認關(guān)閉了它。所以首先我們要開啟它,設置一個合理的閾值,例如下面命令中設置的100ms:
127.0.0.1:6379> CONFIG SET latency-monitor-threshold 100
OK
因為Redis執(zhí)行命令非???,所以我們用DEBUG命令人為制造一些慢執(zhí)行命令:
127.0.0.1:6379> debug sleep 2
OK
(2.00s)
127.0.0.1:6379> debug sleep .15
OK
127.0.0.1:6379> debug sleep .5
OK
下面就用LATENCY的各種子命令來查看延遲記錄:
- LATEST:四列分別表示事件名、最近延遲的Unix時間戳、最近的延遲、最大延遲。
- HISTORY:延遲的時間序列??捎脕懋a(chǎn)生圖形化顯示或報表。
- GRAPH:以圖形化的方式顯示。最下面以豎行顯示的是指延遲在多久以前發(fā)生。
- RESET:清除延遲記錄。
127.0.0.1:6379> latency latest
1) 1) "command"
2) (integer) 1439358778
3) (integer) 500
4) (integer) 2000
127.0.0.1:6379> latency history command
1) 1) (integer) 1439358773
2) (integer) 2000
2) 1) (integer) 1439358776
2) (integer) 150
3) 1) (integer) 1439358778
2) (integer) 500
127.0.0.1:6379> latency graph command
command - high 2000 ms, low 150 ms (all time high 2000 ms)
--------------------------------------------------------------------------------
#
|
|
|_#
666
mmm
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
在執(zhí)行一條DEBUG命令會發(fā)現(xiàn)GRAPH圖的變化,多出一條新的柱狀線,下面的時間2s就是指延遲剛發(fā)生兩秒鐘:
127.0.0.1:6379> debug sleep 1.5
OK
(1.50s)
127.0.0.1:6379> latency graph command
command - high 2000 ms, low 150 ms (all time high 2000 ms)
--------------------------------------------------------------------------------
#
| #
| |
|_#|
2222
333s
mmm
還有一個有趣的子命令DOCTOR,它能列出一些指導建議,例如開啟慢日志進一步追查問題原因,查看是否有大對象被踢出或過期,以及操作系統(tǒng)的配置建議等。
127.0.0.1:6379> latency doctor
Dave, I have observed latency spikes in this Redis instance. You don't mind talking about it, do you Dave?
1. command: 3 latency spikes (average 883ms, mean deviation 744ms, period 210.00 sec). Worst all time event 2000ms.
I have a few advices for you:
- Check your Slow Log to understand what are the commands you are running which are too slow to execute. Please check http:///commands/slowlog for more information.
- Deleting, expiring or evicting (because of maxmemory policy) large objects is a blocking operation. If you have very large objects that are often deleted, expired, or evicted, try to fragment those objects into multiple smaller objects.
- I detected a non zero amount of anonymous huge pages used by your process. This creates very serious latency events in different conditions, especially when Redis is persisting on disk. To disable THP support use the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled', make sure to also add it into /etc/rc.local so that the command will be executed again after a reboot. Note that even if you have already disabled THP, you still need to restart the Redis process to get rid of the huge pages already created.
2.2.3 度量延遲Baseline
延遲中的一部分是來自環(huán)境的,比如操作系統(tǒng)內(nèi)核、虛擬化環(huán)境等等。Redis提供了讓我們度量這一部分延遲基線(Baseline)的方法:
[root@vm redis-3.0.3]# src/redis-cli --intrinsic-latency 100 -h 127.0.0.1
Max latency so far: 2 microseconds.
Max latency so far: 3 microseconds.
Max latency so far: 26 microseconds.
Max latency so far: 37 microseconds.
Max latency so far: 1179 microseconds.
Max latency so far: 1623 microseconds.
Max latency so far: 1795 microseconds.
Max latency so far: 2142 microseconds.
35818026 total runs (avg latency: 2.7919 microseconds / 27918.90 nanoseconds per run).
Worst run took 767x longer than the average latency.
–intrinsic-latency后面是測試的時長(秒),一般100秒足夠了。
2.3 持續(xù)實時監(jiān)控
Unix的WATCH命令是一個非常實用的工具,它可以實時監(jiān)視任意命令的輸出結(jié)果。比如上面我們提到的命令,稍加改造就能變成持續(xù)地實時監(jiān)控工具:
[root@vm redis-3.0.3]# watch -n 1 -d "src/redis-cli -h 127.0.0.1 info | grep -e "connected_clients" -e "blocked_clients" -e "used_memory_human" -e "used_memory_peak_human" -e "rejected_connections" -e "evicted_keys" -e "instantaneous""
Every 1.0s: src/redis-cli -h 127.0.0.1 info | grep -e... Wed Aug 12 14:30:40 2015
connected_clients:1
blocked_clients:0
used_memory_human:799.66K
used_memory_peak_human:852.35K
instantaneous_ops_per_sec:0
instantaneous_input_kbps:0.01
instantaneous_output_kbps:1.23
rejected_connections:0
evicted_keys:0
[root@vm redis-3.0.3]# watch -n 1 -d "src/redis-cli -h 127.0.0.1 latency graph command"
Every 1.0s: src/redis-cli -h 127.0.0.1 latency graph command Wed Aug 12 14:33:25 2015
command - high 2000 ms, low 150 ms (all time high 2000 ms)
--------------------------------------------------------------------------------
#
| #
| |
|_#|
4441
0006
mmmm
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
2.4 慢操作日志
像SORT、LREM、SUNION等操作在大對象上會非常耗時,使用時要注意參照官方API上每個命令的算法復雜度。用前面介紹過的慢操作日志監(jiān)控操作的執(zhí)行時間。就像主流數(shù)據(jù)庫提供的慢SQL日志一樣,Redis也提供了記錄慢操作的日志。注意這部分日志只會計算純粹的操作耗時。
slowlog-log-slower-than設置慢操作的閾值,slowlog-max-len設置保存?zhèn)€數(shù),因為慢操作日志與延遲記錄一樣,都是保存在內(nèi)存中的:
127.0.0.1:6379> config set slowlog-log-slower-than 500
OK
127.0.0.1:6379> debug sleep 1
OK
(0.50s)
127.0.0.1:6379> debug sleep .6
OK
127.0.0.1:6379> slowlog get 10
1) 1) (integer) 2
2) (integer) 1439369937
3) (integer) 473178
4) 1) "debug"
2) "sleep"
3) ".6"
2) 1) (integer) 1
2) (integer) 1439369821
3) (integer) 499357
4) 1) "debug"
2) "sleep"
3) "1"
3) 1) (integer) 0
2) (integer) 1439365058
3) (integer) 417846
4) 1) "debug"
2) "sleep"
3) "1"
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
輸出的四列的含義分別是:記錄的自增ID、命令執(zhí)行時的時間戳、命令的執(zhí)行耗時(ms)、命令的內(nèi)容。注意上面的DEBUG命令并沒有包含休眠時間,而只是命令的處理時間。
3.官方優(yōu)化建議
3.1 網(wǎng)絡延遲
客戶端可以通過TCP/IP或Unix域Socket連接到Redis。通常在千兆網(wǎng)絡環(huán)境中,TCP/IP網(wǎng)絡延遲是200us(微秒),Unix域Socket可以低到30us。關(guān)于Unix域Socket(Unix Domain Socket)還是比較常用的技術(shù),具體請參考Nginx+PHP-FPM的域Socket配置方法。
什么是域Socket?
維基百科:“Unix domain socket 或者 IPCsocket 是一種終端,可以使同一臺操作系統(tǒng)上的兩個或多個進程進行數(shù)據(jù)通信。與管道相比,Unix domain sockets 既可以使用字節(jié)流數(shù)和數(shù)據(jù)隊列,而管道通信則只能通過字節(jié)流。U**nix domain sockets的接口和Internet socket很像,但它不使用網(wǎng)絡底層協(xié)議來通信。Unix domain socket的功能是POSIX操作系統(tǒng)里的一種組件。Unix domain sockets使用系統(tǒng)文件的地址來作為自己的身份。它可以被系統(tǒng)進程引用。所以兩個進程可以同時打開一個Unix domain sockets來進行通信。不過這種通信方式是發(fā)生在系統(tǒng)內(nèi)核里而不會在網(wǎng)絡里傳播**?!?/p>
網(wǎng)絡方面我們能做的就是減少在網(wǎng)絡往返時間RTT(Round-Trip Time)。官方提供了以下一些建議:
- 長連接:不要頻繁連接/斷開到服務器的連接,盡可能保持長連接(Jedis現(xiàn)在就是這樣做的)。
- 域Socket:如果客戶端與Redis服務端在同一臺機器上的話,使用Unix域Socket。
- 多參數(shù)命令:相比管道,優(yōu)先使用多參數(shù)命令,如mset/mget/hmset/hmget等。
- 管道化:其次使用管道減少RTT。
- LUA腳本:對于有數(shù)據(jù)依賴而無法使用管道的命令,可以考慮在Redis服務端執(zhí)行LUA腳本。
3.2 磁盤I/O
3.2.1 寫磁盤
盡管Redis也是基于多路I/O復用的單線程機制,但是卻沒有像Nginx一樣提供CPU Affinity的設置,避免fork出的子進程也跑在Redis主進程依附的CPU內(nèi)核上,導致后臺進程影響主進程。所以還是讓操作系統(tǒng)自己去調(diào)度Redis主進程和后臺進程吧。但反過來,如果不開啟持久化機制的話,為Redis設置親和性是否能進一步提升性能呢?
3.2.2 操作系統(tǒng)Swap
如果系統(tǒng)內(nèi)存不足,可能會將Redis對應的某些頁從內(nèi)存swap到磁盤文件上??梢酝ㄟ^/proc文件夾中的smaps文件查看是否有數(shù)據(jù)頁被swap。如果發(fā)現(xiàn)大量頁被swap,則可以用vmstat和iostat進一步追查原因:
[root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1 info | grep process_id
process_id:24191
[root@vm redis-3.0.3]# cat /proc/24191/smaps | grep "Swap"
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
...
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
Swap: 0 kB
3.3 其他因素
3.3.1 Fork子進程
寫RDB文件和rewrite AOF文件都需要fork出一個后臺進程,fork操作的主要消耗在于頁表的拷貝,不同系統(tǒng)的耗時會有些差異。其中,Xen問題比較嚴重。
3.3.2 Transparent Huge Page
此外,如果Linux開啟了THP(Transparent Huge Page)功能的話,會極大地影響延遲。
3.3.3 Key過期
Redis同時使用主動和被動兩種方式剔除已經(jīng)過期的Key:
- 被動:當客戶端訪問到Key時,發(fā)現(xiàn)已經(jīng)過期,則剔除
- 主動:每100ms剔除一批Key,假如過期Key超過25%則反復執(zhí)行
所以,要避免同一時間超過25%的Key過期導致的Redis阻塞,設置過期時間時可以稍微隨機化一些。
4.最后一招:WatchDog
官方說法提供的最后一招(last resort)就是WatchDog。它能夠?qū)⒙僮鞯恼麄€函數(shù)執(zhí)行棧打印到Redis日志中。因為它與前面介紹過的將記錄保存在內(nèi)存中的延遲和滿操作記錄不同,所以記得使用前要在redis.conf中配置logfile日志路徑:
[root@vm redis-3.0.3]# src/redis-cli -h 127.0.0.1
127.0.0.1:6379> CONFIG SET watchdog-period 500
OK
127.0.0.1:6379> debug sleep 1
OK
[root@vm redis-3.0.3]# tailf redis.log
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
51091:M 12 Aug 15:36:53.337 # Server started, Redis version 3.0.3
51091:M 12 Aug 15:36:53.338 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
51091:M 12 Aug 15:36:53.338 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
51091:M 12 Aug 15:36:53.343 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
51091:M 12 Aug 15:36:53.343 * DB loaded from disk: 0.000 seconds
51091:M 12 Aug 15:36:53.343 * The server is now ready to accept connections on port 6379
51091:signal-handler (1439365058)
--- WATCHDOG TIMER EXPIRED ---
src/redis-server 127.0.0.1:6379(logStackTrace+0x43)[0x450363]
/lib64/libpthread.so.0(__nanosleep+0x2d)[0x3c0740ef3d]
/lib64/libpthread.so.0[0x3c0740f710]
/lib64/libpthread.so.0[0x3c0740f710]
/lib64/libpthread.so.0(__nanosleep+0x2d)[0x3c0740ef3d]
src/redis-server 127.0.0.1:6379(debugCommand+0x58d)[0x45180d]
src/redis-server 127.0.0.1:6379(call+0x72)[0x4201b2]
src/redis-server 127.0.0.1:6379(processCommand+0x3e5)[0x4207d5]
src/redis-server 127.0.0.1:6379(processInputBuffer+0x4f)[0x42c66f]
src/redis-server 127.0.0.1:6379(readQueryFromClient+0xc2)[0x42c7b2]
src/redis-server 127.0.0.1:6379(aeProcessEvents+0x13c)[0x41a52c]
src/redis-server 127.0.0.1:6379(aeMain+0x2b)[0x41a7eb]
src/redis-server 127.0.0.1:6379(main+0x2cd)[0x423c8d]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3c0701ed5d]
src/redis-server 127.0.0.1:6379[0x419b49]
51091:signal-handler (51091:signal-handler (1439365058) ) --------
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
附:參考資料
不得不說,Redis的官方文檔寫得非常不錯!從中能學到很多不只是Redis,還有系統(tǒng)方面的知識。前面推薦大家仔細閱讀官方網(wǎng)站上的每個主題。
- Redis latency monitoring framework
- Redis latency problems troubleshooting
- SLOWLOG
|