Working with Computers
Much of the information on this page comes from Dr. Bill Miller III at A. T. Still University.
Here are a few tips that might be useful as you get used to working with computers in a research setting.
What Computer am I Working on?
When you can easily ssh from one computer to another it is easy
to get confused and not know which computer you are working on.
Thus, the command hostname
was created to print out the name of
the host computer you are actively working on.
Remotely Accessing Computers or Clusters
The great part of computational work is that you should never have
to leave your desk to do any work. You can be logged into a computer
halfway around the world and you should be able to work as if you
were there. The ssh
command is what allows us this tremendous
flexibility. The general syntax for ssh is
ssh [options] username@remote_computer_name
If your username on your local computer and the computer you
are trying to connect to are the same, then you do not need to
include the username@
part of the command. Instead, it would
just be
ssh [options] remote_computer_name
For example,
ssh acme.chem.colostate.edu
In the above example, no options were included and I will not go
into detail about all the different options ssh
has, but I
did want to mention the options for X-forwarding. If you include a
-X
or -Y
flag between ssh
and your destination, you
will enable X-forwarding during your connection. What does that
mean? That means you will be able to bring up GUIs (Graphical
User Interfaces) on the remote computer and they will display on
your local computer screen. So, for example, if you ssh to the
overlap cluster using X-forwarding
ssh -Y acme.chem.colostate.edu
You must be aware that since you are running these graphics remotely that anything you try to do will be slower than if you were doing it on your own computer.
As a final note, there are subtle differences between using the
-Y
and -X
flag options, but for practical purposes you
should not notice a difference and thus you can use either
interchangeably.
Setting up an ssh Key
Information in this section comes from here.
When logging into different computers, you have to type your password each time unless you set up an ssh key. If you set up this key, you will be able to ssh to another computer directly, without the need to type your password each time.
First, you need to generate an ssh key on your computer, which will also be helpful when using GitHub, for example. More information about setting up the ssh key can be found on the GitHub site.
To check to see if you already have an existing ssh key, type:
ls ~/.ssh
to show you what (if any) ssh keys you have. These keys will likely
end in .pub
.
Once you have an ssh key created, you can apply that key to
different computers/servers with ssh-copy-id username@remote_host
.
For example, if your username is CamRam and you want to set up an ssh key for ACME, type:
ssh-copy-id CamRam@acme.chem.colostate.edu
And you'll be able to login to ACME without typing your password.
Note
This is only applicable when logging in from the computer
with the ssh key that you have used the ssh-copy-id
command to set up.
Who am I Logged In As?
In case you find yourself questioning your very existence and who you are as a person, you can always ask your computer. Typing
whoami
will print the username you used to login to your computer.
What Processes are Running in my Terminal Window
ps
The ps
command is similar to the top command,
except instead of viewing all the processes that are running on your
computer, ps
will only display the processes that are running in
your current shell/window. So clearly there are significantly less
processes shown using ps
than top
, but if you are running
a bunch of
processes that are executing the same command, ps may be a better
way of displaying them for you if you need to terminate one of them.
ps
is also not dynamic. When you type ps
the current processes
are
just printed to the screen and the command prompt is available for
another unix command.
top
Occasionally it will be helpful to determine what processes are
currently running on your computer. For this situation, we have
the top
command. top
will display the processes currently
running in your Terminal window.
The processes are listed in descending order of %CPU usage.
The display is dynamic and updates every ~3 seconds. While
top
is being displayed, if you press 1 you will see the load
on each specific processor.
There are many columns of information displayed using top
,
some of which are self-explanatory and some of which aren’t
important to us. The PID
column is an identifying number for each
command. If you need to kill one of these jobs while top
is
running, press k
and a line will be printed just above the
processes that says PID to kill:
. You can enter the PID
number
and press enter
to kill any of the jobs that are running.
You can also see if any other users are running processes on your
computer (someone might decides to try to hijack your CPUs or GPUs
if they don’t think you are using them enough). top also allows
you to view the percentage CPU (%CPU) and memory (%MEM) usage for
each process, the length of time the job has been running (although
this is not normal Earth time), and the name of the command that is
running.
Once you have finished examining top you can terminate it by just
pressing q
on the keyboard to quit (or pressing ctrl+c
)
Learn About the Computer
Depending on the machine that you are working on, there may be
a different architecture/setup of that computer. There can also be
different numbers of CPUs that are available for use on different
machines. One way to figure out this kind of information about the
machine you are working on is with with command lscpu
. When you
are on a linux machine, typing this command will give a lot of
information about the computer. The results of this command are
printed to the Terminal window. For example, typing
lscpu
on Fireball of the Paton Lab gives the result:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
Stepping: 7
CPU MHz: 3572.460
CPU max MHz: 3900.0000
CPU min MHz: 1000.0000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
This command should give you any of the information that you are looking for about a computer.
which - What Program am I Using?
The which command can be extremely useful for determining:
if a command is in your PATH and
the exact path to the executable
For example, if you want to know if the vmd program is in your PATH, type
which vmd
which should give the result
/usr/local/bin/vmd
This tells you that vmd is in your PATH, so if you just type vmd
the VMD program will open, and it tells you the VMD program that
will open is located in /usr/local/bin
. This is important
because sometimes you might have multiple copies of the same
program on your computer. If I wanted to use the beta version of
the VMD program I could download it from the VMD website and
install it on my computer. I would then have to put the beta VMD
bin directory in my PATH (using the instructions outlined in the
$PATH section) so that when I type vmd
I will open the beta
VMD instead of the native VMD in /usr/local/bin/
.
Typing which vmd
will let you know exactly which VMD you are
attempting to use.
Disk Usage Details
Our computers only have a limited amount of storage, so it is useful
to know the disk usage details for our entire computer, as well as
folders on our computer. For this purpose we use the du command.
Typing the command du -h
will print the size of every
subfolder within your current directory and print out the final
disk usage at the end. If you have many directories and
subdirectories within your current folder, the list printed to
the screen can become quite large.
The -s
flag is intended to help remove all of the mess
and just give you the final details if that is all you want.
This is useful if you only care about the total size of a directory
and don't care about the size of individual subfolders.
Search for Files
The find
command searches a directory/path for files that match the
given expression provided by the user. The general syntax is
find [path] expression
This command has a LOT of options.
An example find command is shown below to give you an idea of what find can do.
find $HOME -mtime 0
This command will search for files in your home directory that have been modified in the last 24 hours.
find
is also capable of executing arbitrary bash commands with the
filenames it finds:
find ./ -maxdepth 3 -mindepth 1 -name '*.com' -exec "cp {} ../../MoveFilesHere \;"
This command will parse through all folders 1-3 steps in the file
tree, select all of the files that end in ".com", and run the
cp
command to move those files into a new folder that you made.
The portion after the -exec
can be any bash command you would like
to run on the files you've found.
Tell the Computer to Wait
The sleep
command simply tells the computer to pause and wait for
a user-specified amount of time. I have really only found this
useful when writing scripts. The command does not return anything
or print any output. The general syntax is
sleep #
where the number is the time you want the computer to wait for in units of seconds. To make the computer wait 3 seconds you would type
sleep 3
Wasn’t that useful?