The best systems administrators are set apart by their efficiency.
And if an efficient systems administrator can do a task in 10 minutes
that would take another mortal two hours to complete, then the
efficient systems administrator should be rewarded (paid more) because
the company is saving time, and time is money, right?
The trick is to prove your efficiency to management. While I won't attempt to cover that
trick in this article, I will give you 10 essential gems from the lazy
admin's bag of tricks. These tips will save you time—and even if you
don't get paid more money to be more efficient, you'll at least have
more time to play Halo.
The newbie states that when he pushes the Eject button on the DVD
drive of a server running a certain Redmond-based operating system, it
will eject immediately. He then complains that, in most enterprise
Linux servers, if a process is running in that directory, then the
ejection won't happen. For too long as a Linux administrator, I would
reboot the machine and get my disk on the bounce if I couldn't figure
out what was running and why it wouldn't release the DVD drive. But
this is ineffective.
Here's how you find the process that holds your DVD drive and eject
it to your heart's content: First, simulate it. Stick a disk in your
DVD drive, open up a terminal, and mount the DVD drive:
# mount /media/cdrom
# cd /media/cdrom
# while [ 1 ]; do echo "All your drives are belong to us!"; sleep 30; done
Now open up a second terminal and try to eject the DVD drive:
# eject
You'll get a message like:
umount: /media/cdrom: device is busy
Before you free it, let's find out who is using it.
# fuser /media/cdrom
You see the process was running and, indeed, it is our fault we can not eject the disk.
Now, if you are root, you can exercise your godlike powers and kill processes:
# fuser -k /media/cdrom
Boom! Just like that, freedom. Now solemnly unmount the drive:
# eject
fuser
is good.
Try this:
# cat /bin/cat
Behold! Your terminal looks like garbage. Everything you type looks like you're looking into the Matrix. What do you do?
You type reset
. But wait you say, typing reset
is too close to typing reboot
or shutdown
. Your palms start to sweat—especially if you are doing this on a production machine.
Rest assured: You can do it with the confidence that no machine will be rebooted. Go ahead, do it:
# reset
Now your screen is back to normal. This is much better than closing
the window and then logging in again, especially if you just went
through five machines to SSH to this machine.
David, the high-maintenance user from product engineering, calls: "I
need you to help me understand why I can't compile supercode.c on these
new machines you deployed."
"Fine," you say. "What machine are you on?"
David responds: " Posh." (Yes, this fictional company has named its
five production servers in honor of the Spice Girls.) OK, you say. You
exercise your godlike root powers and on another machine become David:
# su - david
Then you go over to posh:
# ssh posh
Once you are there, you run:
# screen -S foo
Then you holler at David:
"Hey David, run the following command on your terminal: # screen -x foo
."
This will cause your and David's sessions to be joined together in
the holy Linux shell. You can type or he can type, but you'll both see
what the other is doing. This saves you from walking to the other floor
and lets you both have equal control. The benefit is that David can
watch your troubleshooting skills and see exactly how you solve
problems.
At last you both see what the problem is: David's compile script
hard-coded an old directory that does not exist on this new server. You
mount it, recompile, solve the problem, and David goes back to work.
You then go back to whatever lazy activity you were doing before.
The one caveat to this trick is that you both need to be logged in as the same user. Other cool things you can do with the screen
command include having multiple windows and split screens. Read the man pages for more on that.
But I'll give you one last tip while you're in your screen
session. To detach from it and leave it open, type: Ctrl-A D
. (I mean, hold down the Ctrl key and strike the A key. Then push the D key.)
You can then reattach by running the screen -x foo
command again.
You forgot your root password. Nice work. Now you'll just have to
reinstall the entire machine. Sadly enough, I've seen more than a few
people do this. But it's surprisingly easy to get on the machine and
change the password. This doesn't work in all cases (like if you made a
GRUB password and forgot that too), but here's how you do it in a
normal case with a Cent OS Linux example.
First reboot the system. When it reboots you'll come to the GRUB
screen as shown in Figure 1. Move the arrow key so that you stay on
this screen instead of proceeding all the way to a normal boot.
Next, select the kernel that will boot with the arrow keys, and type E to edit the kernel line. You'll then see something like Figure 2:
Use the arrow key again to highlight the line that begins with kernel
, and press E to edit the kernel parameters. When you get to the screen shown in Figure 3, simply append the number 1
to the arguments as shown in Figure 3:
Then press Enter, B, and the kernel will boot up to single-user mode. Once here you can run the passwd
command, changing password for user root:
sh-3.00# passwd
New UNIX password:
Retype new UNIX password:
passwd: all authentication tokens updated successfully
Now you can reboot, and the machine will boot up with your new password.
Many times I'll be at a site where I need remote support from
someone who is blocked on the outside by a company firewall. Few people
realize that if you can get out to the world through a firewall, then
it is relatively easy to open a hole so that the world can come into
you.
In its crudest form, this is called "poking a hole in the firewall." I'll call it an SSH back door. To use it, you'll need a machine on the Internet that you can use as an intermediary.
In our example, we'll call our machine blackbox.example.com. The
machine behind the company firewall is called ginger. Finally, the
machine that technical support is on will be called tech. Figure 4
explains how this is set up.
Here's how to proceed:
- Check that what you're doing is allowed, but make sure you ask the
right people. Most people will cringe that you're opening the firewall,
but what they don't understand is that it is completely encrypted.
Furthermore, someone would need to hack your outside machine before
getting into your company. Instead, you may belong to the school of
"ask-for-forgiveness-instead-of-permission." Either way, use your
judgment and don't blame me if this doesn't go your way.
- SSH from ginger to blackbox.example.com with the
-R
flag. I'll assume that you're the root user on ginger and that tech
will need the root user ID to help you with the system. With the -R
flag, you'll forward instructions of port 2222 on blackbox to port 22
on ginger. This is how you set up an SSH tunnel. Note that only SSH
traffic can come into ginger: You're not putting ginger out on the
Internet naked.
You can do this with the following syntax:
~# ssh -R 2222:localhost:22
Once you are into blackbox, you just need to stay logged in. I usually enter a command like:
thedude@blackbox:~$ while [ 1 ]; do date; sleep 300; done
to keep the machine busy. And minimize the window.
- Now instruct your friends at tech to SSH as thedude into blackbox
without using any special SSH flags. You'll have to give them your
password:
root@tech:~# ssh
.
- Once tech is on the blackbox, they can SSH to ginger using the following command:
thedude@blackbox:~$: ssh -p 2222 root@localhost
- Tech will then be prompted for a password. They should enter the root password of ginger.
- Now you and support from tech can work together and solve the problem. You may even want to use screen together! (See Trick 4.)
|
VNC or virtual network computing has been around a long time. I
typically find myself needing to use it when the remote server has some
type of graphical program that is only available on that server.
For example, suppose in
Trick 5,
ginger is a storage server. Many storage devices come with a GUI
program to manage the storage controllers. Often these GUI management
tools need a direct connection to the storage through a network that is
at times kept in a private subnet. Therefore, the only way to access
this GUI is to do it from ginger.
You can try SSH'ing to ginger with the -X
option and
launch it that way, but many times the bandwidth required is too much
and you'll get frustrated waiting. VNC is a much more network-friendly
tool and is readily available for nearly all operating systems.
Let's assume that the setup is the same as in Trick 5, but you want
tech to be able to get VNC access instead of SSH. In this case, you'll
do something similar but forward VNC ports instead. Here's what you do:
- Start a VNC server session on ginger. This is done by running something like:
root@ginger:~# vncserver -geometry 1024x768 -depth 24 :99
The options tell the VNC server to start up with a resolution of
1024x768 and a pixel depth of 24 bits per pixel. If you are using a
really slow connection setting, 8 may be a better option. Using :99
specifies the port the VNC server will be accessible from. The VNC protocol starts at 5900 so specifying :99
means the server is accessible from port 5999.
When you start the session, you'll be asked to specify a password.
The user ID will be the same user that you launched the VNC server
from. (In our case, this is root.)
- SSH from ginger to blackbox.example.com forwarding the port 5999 on
blackbox to ginger. This is done from ginger by running the command:
root@ginger:~# ssh -R 5999:localhost:5999
Once you run this command, you'll need to keep this SSH session open
in order to keep the port forwarded to ginger. At this point if you
were on blackbox, you could now access the VNC session on ginger by
just running:
thedude@blackbox:~$ vncviewer localhost:99
That would forward the port through SSH to ginger. But we're
interested in letting tech get VNC access to ginger. To accomplish
this, you'll need another tunnel.
- From tech, you open a tunnel via SSH to forward your port 5999 to port 5999 on blackbox. This would be done by running:
root@tech:~# ssh -L 5999:localhost:5999
This time the SSH flag we used was -L
, which instead of
pushing 5999 to blackbox, pulled from it. Once you are in on blackbox,
you'll need to leave this session open. Now you're ready to VNC from
tech!
- From tech, VNC to ginger by running the command:
root@tech:~# vncviewer localhost:99
.
Tech will now have a VNC session directly to ginger.
While the effort might seem like a bit much to set up, it beats
flying across the country to fix the storage arrays. Also, if you
practice this a few times, it becomes quite easy.
Let me add a trick to this trick: If tech was running the Windows®
operating system and didn't have a command-line SSH client, then tech
can run Putty. Putty can be set to forward SSH ports by looking in the
options in the sidebar. If the port were 5902 instead of our example of
5999, then you would enter something like in Figure 5.
If this were set up, then tech could VNC to localhost:2 just as if tech were running the Linux operating system.
Imagine this: Company A has a storage server named ginger and it is
being NFS-mounted by a client node named beckham. Company A has decided
they really want to get more bandwidth out of ginger because they have
lots of nodes they want to have NFS mount ginger's shared filesystem.
The most common and cheapest way to do this is to bond two Gigabit
ethernet NICs together. This is cheapest because usually you have an
extra on-board NIC and an extra port on your switch somewhere.
So they do this. But now the question is: How much bandwidth do they really have?
Gigabit Ethernet has a theoretical limit of 128MBps. Where does that number come from? Well,
1Gb = 1024Mb; 1024Mb/8 = 128MB; "b" = "bits," "B" = "bytes"
But what is it that we actually see, and what is a good way to
measure it? One tool I suggest is iperf. You can grab iperf like this:
# wget
You'll need to install it on a shared filesystem that both ginger
and beckham can see. or compile and install on both nodes. I'll compile
it in the home directory of the bob user that is viewable on both
nodes:
tar zxvf iperf*gz
cd iperf-2.0.2
./configure -prefix=/home/bob/perf
make
make install
On ginger, run:
# /home/bob/perf/bin/iperf -s -f M
This machine will act as the server and print out performance speeds in MBps.
On the beckham node, run:
# /home/bob/perf/bin/iperf -c ginger -P 4 -f M -w 256k -t 60
You'll see output in both screens telling you what the speed is. On
a normal server with a Gigabit Ethernet adapter, you will probably see
about 112MBps. This is normal as bandwidth is lost in the TCP stack and
physical cables. By connecting two servers back-to-back, each with two
bonded Ethernet cards, I got about 220MBps.
In reality, what you see with NFS on bonded networks is around
150-160MBps. Still, this gives you a good indication that your
bandwidth is going to be about what you'd expect. If you see something
much less, then you should check for a problem.
I recently ran into a case in which the bonding driver was used to
bond two NICs that used different drivers. The performance was
extremely poor, leading to about 20MBps in bandwidth, less than they
would have gotten had they not bonded the Ethernet cards together!
A Linux systems administrator becomes more efficient by using
command-line scripting with authority. This includes crafting loops and
knowing how to parse data using utilities like awk
, grep
, and sed
. There are many cases where doing so takes fewer keystrokes and lessens the likelihood of user errors.
For example, suppose you need to generate a new /etc/hosts file for
a Linux cluster that you are about to install. The long way would be to
add IP addresses in vi or your favorite text editor. However, it can be
done by taking the already existing /etc/hosts file and appending the
following to it by running this on the command line:
# P=1; for i in $(seq -w 200); do echo "192.168.99.$P n$i"; P=$(expr $P + 1);
done >>/etc/hosts
Two hundred host names, n001 through n200, will then be created with
IP addresses 192.168.99.1 through 192.168.99.200. Populating a file
like this by hand runs the risk of inadvertently creating duplicate IP
addresses or host names, so this is a good example of using the
built-in command line to eliminate user errors. Please note that this
is done in the bash shell, the default in most Linux distributions.
As another example, let's suppose you want to check that the memory
size is the same in each of the compute nodes in the Linux cluster. In
most cases of this sort, having a distributed or parallel shell would
be the best practice, but for the sake of illustration, here's a way to
do this using SSH.
Assume the SSH is set up to authenticate without a password. Then run:
# for num in $(seq -w 200); do ssh n$num free -tm | grep Mem | awk '{print $2}';
done | sort | uniq
A command line like this looks pretty terse. (It can be worse if you
put regular expressions in it.) Let's pick it apart and uncover the
mystery.
First you're doing a loop through 001-200. This padding with 0s in the front is done with the -w
option to the seq
command. Then you substitute the num
variable to create the host you're going to SSH to. Once you have the target host, give the command to it. In this case, it's:
free -m | grep Mem | awk '{print $2}'
That command says to:
- Use the
free
command to get the memory size in megabytes. - Take the output of that command and use
grep
to get the line that has the string Mem
in it. - Take that line and use
awk
to print the second field, which is the total memory in the node.
This operation is performed on every node.
Once you have performed the command on every node, the entire output of all 200 nodes is piped (|
d) to the sort
command so that all the memory values are sorted.
Finally, you eliminate duplicates with the uniq
command. This command will result in one of the following cases:
- If all the nodes, n001-n200, have the same memory size, then only
one number will be displayed. This is the size of memory as seen by
each operating system.
- If node memory size is different, you will see several memory size values.
- Finally, if the SSH failed on a certain node, then you may see some error messages.
This command isn't perfect. If you find that a value of memory is
different than what you expect, you won't know on which node it was or
how many nodes there were. Another command may need to be issued for
that.
What this trick does give you, though, is a fast way to check for
something and quickly learn if something is wrong. This is it's real
value: Speed to do a quick-and-dirty check.
Some software prints error messages to the console that may not
necessarily show up on your SSH session. Using the vcs devices can let
you examine these. From within an SSH session, run the following
command on a remote server: # cat /dev/vcs1
. This will
show you what is on the first console. You can also look at the other
virtual terminals using 2, 3, etc. If a user is typing on the remote
system, you'll be able to see what he typed.
In most data farms, using a remote terminal server, KVM, or even
Serial Over LAN is the best way to view this information; it also
provides the additional benefit of out-of-band viewing capabilities.
Using the vcs device provides a fast in-band method that may be able to
save you some time from going to the machine room and looking at the
console.
In
Trick 8,
you saw an example of using the command line to get information about
the total memory in the system. In this trick, I'll offer up a few
other methods to collect important information from the system you may
need to verify, troubleshoot, or give to remote support.
First, let's gather information about the processor. This is easily done as follows:
# cat /proc/cpuinfo
.
This command gives you information on the processor speed, quantity, and model. Using grep
in many cases can give you the desired value.
A check that I do quite often is to ascertain the quantity of
processors on the system. So, if I have purchased a dual processor
quad-core server, I can run:
# cat /proc/cpuinfo | grep processor | wc -l
.
I would then expect to see 8 as the value. If I don't, I call up the vendor and tell them to send me another processor.
Another piece of information I may require is disk information. This can be gotten with the df
command. I usually add the -h
flag so that I can see the output in gigabytes or megabytes. # df -h
also shows how the disk was partitioned.
And to end the list, here's a way to look at the firmware of your
system—a method to get the BIOS level and the firmware on the NIC.
To check the BIOS version, you can run the dmidecode
command. Unfortunately, you can't easily grep
for the information, so piping it is a less efficient way to do this. On my Lenovo T61 laptop, the output looks like this:
#dmidecode | less
...
BIOS Information
Vendor: LENOVO
Version: 7LET52WW (1.22 )
Release Date: 08/27/2007
...
This is much more efficient than rebooting your machine and looking at the POST output.
To examine the driver and firmware versions of your Ethernet adapter, run ethtool
:
# ethtool -i eth0
driver: e1000
version: 7.3.20-k2-NAPI
firmware-version: 0.3-0
There are thousands of tricks you can learn from someone's who's an expert at the command line. The best ways to learn are to:
- Work with others. Share screen sessions and watch how others
work—you'll see new approaches to doing things. You may need to swallow
your pride and let other people drive, but often you can learn a lot.
- Read the man pages. Seriously; reading man pages, even on commands
you know like the back of your hand, can provide amazing insights. For
example, did you know you can do network programming with
awk
? - Solve problems. As the system administrator, you are always solving
problems whether they are created by you or by others. This is called
experience, and experience makes you better and more efficient.
I hope at least one of these tricks helped you learn something you
didn't know. Essential tricks like these make you more efficient and
add to your experience, but most importantly, tricks give you more free
time to do more interesting things, like playing video games. And the
best administrators are lazy because they don't like to work. They find
the fastest way to do a task and finish it quickly so they can continue
in their lazy pursuits.
Learn