Working with Files

We are constantly working with files while navigating the terminal during our research, here are some of the ways that we can do that:

touch - Crating an Empty File

touch is a very simple program that allows you to create a new, empty file. The general syntax is

touch [options] filename

Without any options, the file will simply be created with no contents. You can also use the available options with the touch command to update the date/time the file was modified with the -m command. On some clusters, especially supercomputers, they have built-in commands that automatically delete any files that have not been modified/changed in the last 30 days. With touch you could setup a script that will automatically touch all your files and update the modified date/times so the supercomputer does not automatically delete them.

vi - Reading and Writing in the Terminal

General Concepts

You should already know the basics of using vi (also known as vim) for text editing, but I wanted to cover a few more useful parts of vi that you may not be aware of. You should already know that you can open vi to create or edit any file by typing

vi filename

Once inside the vi text editor, you can navigate around using letters on your keyboard

  • H key – move left

  • J key – move down

  • K key – move up

  • L key – move right

These will always work in vi. You can also usually just use the arrow keys on your keyboard, but occasionally you may login to a computer where vi does not recognize the arrows on your keyboard and then you will need to use the H thru L keys on the keyboard.

Below are some other helpful commands/tips that can be used within vi. Once you start working your way through these you will start to notice patterns with the commands that will hopefully make them start to make more sense.

  • To move your cursor to the end of a word, press w.

  • To move your cursor to the very end of a line/row, press Shift+4.

  • To move your cursor to the beginning of the current line/row, press 0. That is a zero, not an o.

  • To move your cursor to the last line of the file, press Shift+G.

  • To page down, press ctrl+F.

  • To page up, press ctrl+B.

  • To move your cursor down 6 lines, press 6 and then the down arrow (or J key)

  • To move your cursor up 8 lines, press 8 and then the up arrow key (or K key)

  • To move your cursor over right 5 characters, press 5 and then the right arrow key (or L key)

  • To move your cursor over left 3 characters, press 3 and then the left arrow key (or H key)

  • To delete the letter your cursor is currently on, press x.

  • To copy the current line of text, press yy.

  • To copy 4 lines (the current line plus the 3 following lines) of text, press 4yy.

  • To copy the current line and every line until the end of the file, press y, then Shift+G

  • To paste a copied line (or lines) of text, press p.

  • To delete the current line of text, press dd.

  • To delete 4 lines (the current line plus the 3 following lines) of text, press 4dd.

  • To delete from where your cursor is currently, to the end of the word, press dw.

  • To delete from where your cursor is currently to the end of the line, press d, then Shift+4

  • To delete from the current line to the end of the file, press d, then Shift+G

  • To display the line number for each row of text, type :set nu and press enter.

  • To undisplay the line number for each row of text, type :set nonu and press enter.

  • To jump to line 453 in the file, type :453 and press enter.

  • To search for a string of text, type /string and press enter. Each instance of string should be highlighted.

  • After searching for a string, you can jump to the next matching string by pressing the N key.

  • You can jump to the previous matching string by pressing Shift+N.

  • You can scroll through you previous searches in vi by typing a slash (/) and then scrolling through the old searching using the up arrow key.

  • To find and replace all instances of a string, type :%s/old_text/new_text/g and press enter.

  • To find and places all instances of a string only on the current line, type :s/old_text/new_text/g and press enter.

  • You can scroll through your previous commands (even from now-closed Terminal windows) by typing a colon (:) and then scrolling through the old commands using the up arrow key.

  • To undo your most recent change/edit, press the U key. This can be done multiple times.

  • To save changes without quitting vi, type :w and press enter.

  • To quit without saving changes, type :q! and press enter.

  • To quit and save changes, type :wq and press enter.

It may also be helpful to know about swap files. When you use vi to edit a file, you are not actually editing the actual file. vi creates a swap file that stores all your changes in that file, and then if you decide to save those changes, vi will replace the original file with the swap file that contains your edits. If you use vi to edit a file called file.txt, then vi creates a swap file called .file.txt.swp where your edits are stored. Notice that the filename starts with a dot (.) meaning the file is hidden (although you can see it with ls using the ls -a command). Also, you can only have one swap file for a given file at a time. For that reason, try to make sure you always close out of your file before exiting the terminal.

Modes

The vi program has several available modes/options that help you edit a text file. The ones that I will cover here include Insert, Macro, and Replace.

Insert

The Insert mode in vi is used to, you guessed it, insert new text. To enter insert mode, press the I key on your keyboard. Once you do this, and while you remain in insert mode, the bottom of your Terminal window should say

--INSERT--

This mode will allow you to insert new text and use the backspace key just as if you were in a text editor you are more familiar with (i.e. that word processing program that shall remain nameless). The only difference is that you will not be able to use your mouse to move your cursor around. Simply stick to using the arrow keys. Pressing I will allow you to enter Insert mode in the exact place where your cursor is currently. However, you can also press the A key on your keyboard to enter Insert mode. The difference is when you press A, you will enter Insert mode and your cursor will move to the position immediately to the right of its current position. This is helpful when you are at the end of a line and you want to add new text. Once you are done making your edits, you need to exit Insert mode by pressing the Esc key on your keyboard.

Macros

A helpful but slightly more advanced technique is creating macros (a shortcut to a task you do repeatedly) in vi. You can easily use Insert mode to edit your text. But if you need to make the same edit a thousand times it is a waste of your time and effort to do that by hand. Instead, you can create a macro to do it for you. Let’s say I have a file that contains the same line repeating over and over for 10 lines.

Now let’s say we want to change part of that line to something else, but for only every other line (if it were all lines we could just use the find and replace option shown previously using :%s). For this situation, we could setup a macro to do it for us. Pressing the Q key on your keyboard (while not in any other modes, such as Insert, obviously) tells vi that you want to start a macro. Then press one letter/number on your keyboard that will be the ‘name’ of that macro for this vi session. I typically press the A key because it’s the first letter of the alphabet and easy to remember. But if you are creating multiple macros in the same vi session then you may want to assign them letters that will help you remember what task they perform. Once you press the A key (or whatever letter/number you are assigning to this macro), you should see the following appear in the bottom left corner of your Terminal window

recording

This lets you know that vi is recording every move and change you make. Before I pressed Q (and A), I moved my cursor over the first letter of what we wanted to change. This is important because remember vi is recording every move you make, including movements of your cursor. Once I have pressed Q and A so vi is recording, I type dw twice to remove two words. Then I type I to enter Insert mode and type my new desired phrase. I then press Esc to exit Insert mode, and finally move my cursor so it is on the first letter of what I want to change two lines down. At this point, I have finished making my macro such that if I were to repeat the macro from my current cursor position then my initial phrase would be replaced by the new phrase and the cursor would be moved down two lines again. Once you are done making your macro you need to tell vi that you are done by pressing the Q key again (the recording in the bottom left corner of the Terminal window should disappear now). vi has now saved your macro. To run your macro five times, type 5@A (i.e. press the 5 key, then press Shift+2, and then press the A key on your keyboard). After defining the macro and running it 5 times the text now looks like what we wanted.

Replace

The vi program has two different methods of text replacement. The first is to only replace a single character in the text file. If you press the R key followed by pressing the K key, then the character your cursor was on will be replaced by a "K". As an example, consider the following line in a text file.

The mouse kissed the little boy.

If you place your cursor over the "b" of "boy" and then press the R key followed by the T key on your keyboard, the text will change to

The mouse kissed the little toy.

See how that works?

The other version of Replace is to enter an explicit Replace mode where everything you type overwrites whatever was already there (similar to the functionality of the Insert key on some keyboards). To enter this replace mode you need to press Shift+R while in vi (and, of course, you cannot be in any other modes at the time). Once you press Shift+R you will see the following in the bottom left corner of your Terminal window

--Replace--

as a reminder that you are in Replace mode. Once again, while you are in this mode, you can type and move around all you want, but anything you type will overwrite any current text in the file. To exit Replace mode, simply press the Esc key on your keyboard (just like you do to exit Insert mode earlier) and the "--Replace--" in the bottom left corner of your Terminal window will disappear. Just like with Insert mode, you will need to exit Replace mode before you can save any changes.

Visual

vi also has a Visual block mode available that you can enter by pressing the V key on your keyboard, but I won’t go into any detail about this mode now. I just want to mention that it exists. If you want to know more information, Google it. :)

Reading File Contents

head - Reading the Start of a File

The head command will print the top lines of a file to the Terminal screen. The general syntax for the head command is head [options] filename. By default, head will print the first 10 lines of the file you provide. This command can also be used to print a different number of lines at the top of a file. For example,

head -n 25 water_optimization.log

will print out the first 25 lines of the file water_optimization.log. Using the -n flag allows you to specify the number of lines printed.

tail - Reading the End of a File

tail is a simple command that prints the last lines of a file. By default, tail prints the final 10 lines of a file to the Terminal screen.

You can also use tail to print only a certain number of lines to the screen. For example, if I only wanted the last 3 lines of a file, I could use the -n flag preceeding a 3.

tail -n 3 filename

You can also use the tail command with the -f flag if you are running a calculation to update the end of the file as the output is printed. So if you type

tail -f logfile

the final 10 lines of logfile will be printed to the screen, then as more lines are printed to logfile, they will also show up on the screen. This will continue until you signal the computer to end the printing by pressing ctrl-c on the keyboard, which terminates the tail command.

cat - Printing File Contents to the Terminal

The cat command will print the entire contents of a file to the terminal window. You will be able to scroll in the terminal to see the start of the file, but unless more or less, the whole file is printed to the terminal at once. You can use this command with:

cat <filename>

less - Reading a File in the Terminal

The less command provides users with the ability to read a file without worrying about the chance of unintentionally editing the file. The general syntax for less is

less filename

When you execute a less command to read a file, the Terminal window will be filled with the contents of the file (i.e. you will not be able to see any of your previous commands in the current window). You can scroll up and down to see the contents of the file using the arrow keys on your keyboard. You can also page down using the spacebar key, and page up using the B key (short for back) on your keyboard. You can also search for instances of a certain word or phrase by typing

/text to search for

Once you press the slash key (/) your cursor will move to the bottom of the Terminal window where you will be able to see what text you are typing. Press enter to search the file for the text. Every instance of the phrase should be highlighted. If you accidentally press the slash key (or decide that you do not want to search for the text you started typing, you can just backspace until the slash is gone and you will be able to scroll through the text of the file again.

You can also go straight to the end of a file by pressing Shift+G on the keyboard.

To quit less just press the Q key (for quit).

more - Reading a File in the Terminal

more is an antiquated version of less that can be used for reading (but not editing) files. Unlike less, with more you can only page down (e.g. you cannot scroll line-by-line with the arrow keys, and you cannot page or scroll up at all) and you cannot search the text for a string of phrase.

sed - Editing Files in the Command Line

Information for this section can be found on this StackOverflow page.

sed is an extremely useful tool, especially to the two people in the world that really know how to use it. And I am not one of those two people. As you might be guessing from my previous statements, sed is not an easy language to understand, but it is still useful thanks to Google. Just knowing that sed exists and the types of things it can do make it useful because you can likely find someone else that has reported the exact thing you want to do online. Below I have listed a few of the more common sed commands I have used on a regular basis and what they do.

Remove the first line of a file, often done to remove the heading names of columns:

sed '1d' filename

The results will be printed to the Terminal screen.

Remove lines 1 to 5766 in a file:

sed -i '1,+5766d' filename

The -i in this case will delete lines 1 to 5766 and instead of printing the results to the screen, will just save the results in the original filename.

Find and replace the commas in filename with a space instead:

sed 's/,/ /g' filename

Often when working with large amounts of data/jobs, you might accidentally make a typo that is now present in all of your files. There are a number of ways to correct this typo, but one of the easiest is to find-and-replace the erronous part of the file with the correct version.

If the problem is only in one file, it's easy enough to just vi into the folder or open it in a text editor and fix it. One way to do this is to find-and-replace-all with one of the commands that's a part of vim:

:%s/string_to_find/replacement_string/g

This code will find every instance of "string_to_find" in your file and replace them all with "replacement_string". The "g" at the end of this command is what tells vi to replace EVERY instance of the string, rather than just the next one (if this is what you want, just leave out the "g")

Note

This is an example of the sed command, just one that's already incorporated into vim. To use this feature, just make sure you aren't in Insert mode or anything else by pressing esc.

If this is an error that exists in all of your files, it can be tedious to go into every file, make the change, save it, then move onto the next one. This is where the sed command comes in handy.

With this command, you can find and replace a string into several select files in your directory. For example, if you have a lot of Gaussian input files where you accidentally forgot to include your solvent, you can use the following command:

sed -i 's/m062x 6-31+G* opt freq/m062x 6-31+G* opt freq scrf=(smd,solvent=water)/g' *com

This command will replace the incorrect route line with the corrected version in all files within the directory which end in "com". m062x 6-31+G* opt freq is the original line with the error, while m062x 6-31+G* opt freq scrf=(smd,solvent=water) is the corrected line. This is a fast, easy way to make sure that you are running jobs at the desired level of theory, or if you decide that you want to repeat the calculation with a different basis set/functional/solvent.

Again, there are lots of other uses for sed that aren't listed. Many introductions and tutorials for sed can be found online. And if you do actually learn how to actually use sed, please write it up here. :)

Warning

It is possible if working on BSD systems like MacOS that you might need to include the additional extension -i '.bak' in your command to avoid risking corruption or partial content.

Condensing Files/Folders

tar

Often if you are asked to package up some of your files to send to someone else, they will request you send them a tarball. Don’t be scared, this is not an athletic term that you haven’t heard of before. A tarball is a file created using the tar command that often contains several files and/or folders. Repackaging a bunch of files into a single file makes them easier to distribute. For example, if you download the source code for a program, you will most likely be downloading a tarball of all the files. A tarball also gives you the opportunity to zip all the files to make the tarball smaller than the combined sum of all the files individually, which is also helpful for distributing files. The general syntax for tar is

tar [options] tarball.tar folder/files

You can either tar up a folder (or several) and/or a bunch of files, although it is more common to put all the files into a single folder and then make a tarball of the folder. This makes it more convenient for whoever unpacks the tarball you are creating.

The same command, tar, is used to both make the tarball and unpack the tarball, which means the options are important here. Most often I will use the following tar command to create a tarball

tar -zcvf tarball.tgz folder/

Notice that the end of the file is now marked with a .tgz instead of .tar, and that is because we have zipped (the z from -zcvf) the folder and its contents. The c option is signaling that we want to create a tarball. The v option makes the process verbose (i.e. it prints as much information as possible during the process), and the f option lets tar know we want to put the contents into an archive file (that we call tarball.tgz).

Furthermore, to unpack a tarball we will use the exact same command, but instead of using the c option to create a tarball we use an x flag to extract the contents of a tarball.

tar -zxvf tarball.tgz

Notice that we don’t need to include the final folder/file since we aren’t creating the tarball here, we are just extracting it. Also, if the tarball.tgz was alternatively named tarball.tar (an indication that it was not zipped), then you would not need to use the z flag.

Finally, using the -z compresses the files using gzip Tarballs can also be zipped using bzip, and in that case you will need to replace the -z flag with a -j flag.

zip

zip is similar to tar, where it compresses the files you specify into a smaller file which can be unzipped later on. This is a helpful way to transporting/sharing large amounts of data or whole folders of information. The way that you use this command is:

zip name_of_zip_file.zip <files to be condensed>

Looking for Patterns

grep - Print Lines Matching a Pattern

grep is a very useful tool for searching a very long file for a certain string and printing the results to the screen. The general syntax is

grep pattern file

An example might be searching through a long Gaussian output file to make sure that your calculation finished normally. An example of this might be

grep "Normal termination" water_optimization.log

In this case, I have searched through the file water_optimization.log for the string "Normal termination". This string is printed in the file when the calculation finsihed normally/without error. This can also be done to search for energies or timing information.

awk - Pattern Scanning

awk is a language for pattern recognition and scanning. This is particularly useful when performing analysis and printing out a bunch of values and potentially even doing simple math on those values. For example, if I have a file with lots of columns of text and numbers, but I am only interested in the contents of one column, say the first, then I can use awk to give me only that information.

awk '{print($1)}' file_of_interest.txt

This command will print the first column from the file of interest to the Terminal screen. You can also do basic math (if that column contains numbers) using typical python math symbols. For example, to multiply the value in the first column by 4, you could type:

awk '{print($1*4)}' file_of_interest.txt

sort - Sorting and Re-Ordering Data

The sort command does exactly what its name implies - it sorts the lines of text files and puts them in a specific order. The general syntax is

sort [options] filename

If you have a file of data with 8 randomly placed numbers in it, the data can easily be re-ordered using sort.

By default, sort will rearrange the values and places them in ascending order. You can use the -r flag to reverse the order and put the values in descending order. You can even use sort to randomly reorder the values with the -R flag.

Additionally, you can also sort "human readable numbers" such as 2k (2,000) or 3M (3,000,000) using the -h flag.

sort also has the ability to order letters and words alphabetically. Consider a file that contains a bunch of types of animals. sort will automatically reorder them alphabetically.