HP-UX Glance adviser를 활용하여 모니터링 데이터 수집
HP-UX 시스템 운영시 관리자는 glance를 사용하게 된다. HP에서 제공하는 툴이기 때문이다.
하지막 막상 glance를 쓰게 되면 glance를 수행하는 프로세스가 CPU를 많이 쓰는 것을 느끼게 된다.
관리자가 한 사람이라면 크게 고민할 정도는 되지 않는다. 하지만 모니터링하는 사람이 많다면
glance를 통해 모니터링하는 것만도 CPU 사용율을 꽤 높이게 된다.
이럴 경우 glance를 통해 데이터를 수집하고 모니터링하는 관리자들은 수집한 데이터를 보는 것이 자원 사용의 효율을 높일 수 있다.
아래 내용을 통해 구현하고 활용할 수 있다.
1. HP-UX Glance adviser를 이용해 APPLICATION의 cpu, memory, disk I/O 등의 정보를 1분 간격으로 저장하는 방법
아래 3개의 파일로 구동한다.
monitor_app.adv : glance adviser script 파일
start.sh : glance adviser를 실행하는 스크립트
stop.sh : glance adviser를 종료하는 스크립트
각 파일의 내용은 아래와 같습니다.
monitor_app.adv
PRINT "=======================================================================" PRINT "DATE / TIME: ", GBL_STATDATE, " - ", GBL_STATTIME, " TOT_CPU_USE: ",GBL_CPU_TOTAL_UTIL PRINT "=======================================================================" PRINT "APP name |totalCPU|sysCPU|userCP|phyDSK|logRd |logWr | MEM " PRINT "=======================================================================" APPLICATION LOOP { PRINT APP_NAME, "|", APP_CPU_TOTAL_UTIL, "|", APP_CPU_SYS_MODE_UTIL, "|", APP_CPU_USER_MODE_UTIL, "|", APP_DISK_PHYS_IO_RATE, "|", APP_DISK_LOGL_READ_RATE, "|", APP_DISK_LOGL_WRITE_RATE, "|", APP_MEM_RES }
start.sh
DATE=`date "+%y%m%d%H%M%S"` nohup glance -j 60 -adviser_only -syntax ./monitor_app.adv 1>> ./log.$DATE 2>/dev/null &
stop.sh
kill -9 $(ps -ef | grep adviser_only | grep monitor_app.adv | awk '{print $2}')
아래는 위 내용의 파일을 사용해 glance adviser를 수행한 결과 내용이다.
log.140714144714
======================================================================= DATE / TIME: 07/14/2014 - 14:47:19 ======================================================================= APP name |totalCPU|sysCPU|userCP|phyDSK|logRd |logWr | MEM ======================================================================= other | 70.6| 19.0| 51.6| 11.3| 98.3| 8.3| 18.9gb network | 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 19.8mb memory_management| 0.0| 0.0| 0.0| 2.4| 0.0| 0.0| 32.0mb other_user_root | 0.0| 0.0| 0.0| 0.2| 0.8| 1.0| 353.8mb ======================================================================= DATE / TIME: 07/14/2014 - 14:48:19 ======================================================================= APP name |totalCPU|sysCPU|userCP|phyDSK|logRd |logWr | MEM ======================================================================= other | 50.1| 5.8| 44.3| 31.6| 25.2| 10.6| 18.9gb network | 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 19.8mb memory_management| 0.0| 0.0| 0.0| 3.3| 0.0| 0.0| 32.0mb other_user_root | 0.3| 0.2| 0.0| 0.2| 5.6| 0.4| 354.0mb ======================================================================= DATE / TIME: 07/14/2014 - 14:49:19 ======================================================================= APP name |totalCPU|sysCPU|userCP|phyDSK|logRd |logWr | MEM ======================================================================= other | 50.5| 6.0| 44.5| 10.0| 27.6| 9.2| 18.9gb network | 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 19.8mb memory_management| 0.0| 0.0| 0.0| 3.4| 0.0| 0.0| 32.0mb other_user_root | 0.4| 0.3| 0.1| 0.5| 88.7| 3.5| 354.0mb
2. CPU 사용율 30% 이상인 Process 정보 수집
monitor_cpu.adv
PRINT "=======================================================================" PRINT "DATE / TIME: ", GBL_STATDATE, " - ", GBL_STATTIME, " TOT_CPU_USE: ",GBL_CPU_TOTAL_UTIL PRINT "=======================================================================" PRINT "PROCESS name |PROCESS id| CPU Usage" PRINT "=======================================================================" PROCESS LOOP { if PROC_CPU_TOTAL_UTIL > 30 then { PRINT PROC_PROC_NAME|24, PROC_PROC_ID|10," ", PROC_CPU_TOTAL_UTIL|12 } }
start_cpu.sh
DATE=`date "+%y%m%d%H%M%S"` nohup glance -j 60 -adviser_only -syntax ./monitor_cpu.adv 1>> ./log.$DATE 2>/dev/null &
stop_cpu.sh
kill -9 $(ps -ef | grep adviser_only | grep monitor_cpu.adv | awk '{print $2}')
아래는 위 내용으로 glance adviser를 수행한 결과 내용이다.
log.140714155022
======================================================================= DATE / TIME: 07/14/2014 - 15:50:27 ======================================================================= PROCESS name |PROCESS id| CPU Usage ======================================================================= glance 23679 70.6 oracleMVNOT 7027 100.8 oracleMVNOT 6243 98.9 oracleMVNOT 7032 101.3 ======================================================================= DATE / TIME: 07/14/2014 - 15:50:32 ======================================================================= PROCESS name |PROCESS id| CPU Usage ======================================================================= glance 23679 65.2 glance 29478 65.0 oracleMVNOT 7027 99.0 oracleMVNOT 6243 99.0 glance 22084 64.5 oracleMVNOT 7032 98.3 ======================================================================= DATE / TIME: 07/14/2014 - 15:50:36 ======================================================================= PROCESS name |PROCESS id| CPU Usage ======================================================================= glance 23679 68.8 oracleMVNOT 7027 99.2 oracleMVNOT 6243 101.1 oracleMVNOT 7032 102.1
3. adviser에 참고할 만한 syntax 내용
거의가 glance 사용시 하단에 나오는 alarm에 대한 참고 내용이다.
$ cat adviser.syntax # The following symptoms are used by the default Alarm Window # Bottleneck alarms. They are re-evaluated every interval and # the probabilities are summed. These summed probabilities are # checked by the bottleneck alarms. The buttons on the gpm # main window will turn yellow when a probability exceeds 50% # for an interval, and red when a probability exceeds 90% for # an interval. You may edit these rules to suit your environment: symptom CPU_Bottleneck type=CPU rule GBL_CPU_TOTAL_UTIL > 75 prob 25 rule GBL_CPU_TOTAL_UTIL > 85 prob 25 rule GBL_CPU_TOTAL_UTIL > 90 prob 25 rule GBL_PRI_QUEUE > 3 prob 25 symptom Disk_Bottleneck type=DISK rule GBL_DISK_UTIL_PEAK > 50 prob GBL_DISK_UTIL_PEAK rule GBL_DISK_SUBSYSTEM_QUEUE > 3 prob 25 symptom Memory_Bottleneck type=MEMORY rule GBL_MEM_QUEUE > 2 prob 20 rule GBL_MEM_PAGEOUT_RATE > 5 prob 20 rule GBL_MEM_PAGEOUT_RATE > 50 prob 20 rule GBL_DISK_VM_WRITE_RATE > 5 prob 20 rule GBL_DISK_VM_WRITE_RATE > 50 prob 20 rule GBL_MEM_SWAPOUT_RATE > 1 prob 35 rule GBL_MEM_SWAPOUT_RATE > 4 prob 50 # this symptom definition is only available for 11.0 symptom Network_Bottleneck type=NETWORK rule GBL_NET_OUTQUEUE > 0 prob 10 rule GBL_NET_OUTQUEUE > 1 prob 25 rule GBL_NFS_CALL_RATE > 500 prob 10 rule GBL_NET_COLLISION_PCT > 10 prob 10 rule GBL_NET_COLLISION_PCT > 25 prob 20 rule GBL_NET_COLLISION_PCT > 50 prob 30 rule GBL_NET_PACKET_RATE > 500 prob 10 rule GBL_NET_PACKET_RATE > 1000 prob 10 rule GBL_NET_PACKET_RATE > 3000 prob 20 rule GBL_NET_PACKET_RATE > 5000 prob 20 rule GBL_NET_PACKET_RATE > 9000 prob 20 # Below are the primary CPU, Disk, Memory, and Network Bottleneck alarms. # For each area, a calculated bottleneck symptom probability is used # to define yellow or red alerts. alarm CPU_Bottleneck > 50 for 2 minutes start if CPU_Bottleneck > 90 then red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" else yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" repeat every 10 minutes if CPU_Bottleneck > 90 then red alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" else yellow alert "CPU Bottleneck probability= ", CPU_Bottleneck, "%" end reset alert "End of CPU Bottleneck Alert" alarm Disk_Bottleneck > 50 for 2 minutes start if Disk_Bottleneck > 90 then red alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" else yellow alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" repeat every 10 minutes if Disk_Bottleneck > 90 then red alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" else yellow alert "Disk Bottleneck probability= ", Disk_Bottleneck, "%" end reset alert "End of Disk Bottleneck Alert" alarm Memory_Bottleneck > 50 for 2 minutes start if Memory_Bottleneck > 90 then red alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" else yellow alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" repeat every 10 minutes if Memory_Bottleneck > 90 then red alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" else yellow alert "Memory Bottleneck probability= ", Memory_Bottleneck, "%" end reset alert "End of Memory Bottleneck Alert" alarm Network_Bottleneck > 50 for 2 minutes start if Network_Bottleneck > 90 then red alert "Network Bottleneck probability= ", Network_Bottleneck, "%" else yellow alert "Network Bottleneck probability= ", Network_Bottleneck, "%" repeat every 10 minutes if Network_Bottleneck > 90 then red alert "Network Bottleneck probability= ", Network_Bottleneck, "%" else yellow alert "Network Bottleneck probability= ", Network_Bottleneck, "%" end reset alert "End of Network Bottleneck Alert" # We will alarm according to the percentage of errors only when the packet # rate exceeds a threshold. The values may need to be modified for your # environment. alarm (GBL_NET_PACKET_RATE > 100) and ((GBL_NET_IN_ERROR_PCT > 4) or (GBL_NET_OUT_ERROR_PCT > 2)) start yellow alert "Network error rate exceeded threshold" end reset alert "End of network error rate alert" # The following are system table alarms. If gpm overhead is a concern, and # you think you will not have system table shortage problems, you may wish # to delete these alarms. # Global swap space utilization alarm: alarm GBL_SWAP_SPACE_UTIL > 95 start red alert "Global swap space is nearly full" end reset alert "End of global swap space full condition" # Shared memory table alarm: alarm TBL_SHMEM_TABLE_UTIL > 90 start red alert "Shared memory table is nearly full" end reset alert "End of shared memory table full condition" # Semaphore table alarm: alarm TBL_SEM_TABLE_UTIL > 90 start red alert "Semaphore table is nearly full" end reset alert "End of semaphore table full condition" # Message queue table alarm: alarm TBL_MSG_TABLE_UTIL > 90 start red alert "Message queue table is nearly full" end reset alert "End of message queue full condition" # Process table alarm: alarm TBL_PROC_TABLE_UTIL > 90 start red alert "Process table is nearly full" end reset alert "End of process table full condition" # File table alarm: alarm TBL_FILE_TABLE_UTIL > 90 start red alert "File table is nearly full" end reset alert "End of file table full condition" # File lock table alarm: alarm TBL_FILE_LOCK_UTIL > 90 start red alert "File lock table is nearly full" end reset alert "End of file lock table full condition" # This alarm tests for Transaction Tracker overflows. If you have old # transactions then restarting the ttd will free up that memory. Otherwise, # you may need to restart the midaemon with the -smdvss parm to increase # midaemon capacity. alarm GBL_TT_OVERFLOW_COUNT > 0 start yellow alert "Transaction Tracker overflow - restart ttd or midaemon - see man pages" repeat every 30 minutes yellow alert "Transaction Tracker overflow" # This alarm tests for lost MI trace buffers by the kernel instrumentation. # If this value has increased during the interval, then this alarm triggers. # Intermittent lost buffers can be expected on busy systems, however # consistent buffer loss can lead to incorrect performance information being # reported by the tools. If this alarm triggers often, you may wish to # log a call with your local HP Response Center. # initiallost variable used to keep track of how many lost buffers there # were when glance was first invoked: initiallost = initiallost if initiallost == 0 then initiallost = GBL_LOST_MI_TRACE_BUFFERS # lostbufs variable tracks increases in the cumulative count of lost buffers lostbufs = lostbufs alarm (lostbufs < GBL_LOST_MI_TRACE_BUFFERS) and (initiallost < GBL_LOST_MI_TRACE_BUFFERS) start { yellow alert "MI trace buffer loss detected" lostbufs = GBL_LOST_MI_TRACE_BUFFERS }
4. 결론
여러 프로세스가 glance을 사용하여 배보다 배꼽이 더 커지는 경우가 발생할 경우 이를 해결하기 위해 glance adviser를 수행시켜
필요한 정보에 대해 기록을 하는 것이 좋다. 기록된 내용을 tail 등을 통해 봄으로써 자원 소모를 줄일 수 있다.
위 내용을 통해 필요한 요소에 대해 기록을 하게 되면 이력 정보도 관리할 수 있는 등 여러가지 장점을 얻을 수 있게 된다.
댓글 0
번호 | 제목 | 글쓴이 | 날짜 | 조회 수 |
---|---|---|---|---|
5 | 세마포어(semaphore) | 명품관 | 2021.12.21 | 687 |
4 | 절대 경로가 포함된 스크립트 생성 하기 [1] | Talros | 2016.10.10 | 603 |
» | HP-UX Glance adviser를 활용하여 모니터링 데이터 수집 | 명품관 | 2016.04.08 | 9699 |
2 | find 사용하기 | 명품관 | 2016.03.04 | 832 |
1 | vi 에디터 환경 설정 | 명품관 | 2016.02.02 | 761 |