Write a real time shell script to find out the hardware error in multiple servers
To find hardware errors on multiple servers in real-time, you can use a shell script that connects to each server via SSH, checks for hardware errors, and reports them. A common place to look for hardware-related errors in Linux systems is the /var/log/syslog (or /var/log/messages for older systems), dmesg, and /var/log/kern.log.
Here’s an example shell script that uses ssh to connect to multiple servers, looks for hardware-related errors in their logs, and prints them out.
Prerequisites:
Ensure you have SSH access set up with keys or passwordless login for seamless automation.
Install tools like dmesg or ensure the logs are available on the remote servers.
#!/bin/bash
# List of servers to check
SERVERS=("server1" "server2" "server3")
# Pattern to search for hardware errors
ERROR_PATTERNS=("hardware error" "Machine check error" "mce" "ECC error" "CPU error" "Memory error")
# SSH options (adjust based on your SSH config)
SSH_OPTIONS="-o StrictHostKeyChecking=no"
# Function to check logs for hardware errors
check_hardware_errors() {
local server=$1
echo "Checking hardware errors on $server..."
# Loop through possible error patterns
for pattern in "${ERROR_PATTERNS[@]}"; do
echo "Looking for pattern '$pattern'..."
ssh $SSH_OPTIONS "$server" "dmesg | grep -i '$pattern' || grep -i '$pattern' /var/log/syslog || grep -i '$pattern' /var/log/kern.log"
done
echo "Check completed for $server."
}
# Loop through each server
for server in "${SERVERS[@]}"; do
check_hardware_errors "$server"
done
How the script works:
1. SERVERS: List your server names or IPs in the SERVERS array.
2. ERROR_PATTERNS: Common hardware-related error terms are stored in this array. Modify these terms based on your specific hardware or error messages.
3. SSH_OPTIONS: Modify the SSH options if necessary to handle key exchange or login issues.
4. check_hardware_errors: This function uses dmesg or logs (/var/log/syslog, /var/log/kern.log) to search for hardware-related issues by grepping error patterns.
Running the script:
1. Ensure it has execution permissions:
chmod +x check_hardware_errors.sh
2. Run the script:
./check_hardware_errors.sh
This will run through the list of servers and search for potential hardware errors in real time. If the servers are remote, ensure you have the necessary permissions and SSH access to run these commands.
Real time Book: https://payhip.com/b/247HD

Comments
Post a Comment