Linux Terminal: Removing/Changing newlines without using a text editor

Linux Terminal: Removing/Changing newlines without using a text editor

In the interests of automation, I wanted to be able to convert the output of a command (In this case “ls”) into a format that can be read by another program.

To do this I had to replace the newline characters in the command’s output with the pipe character (“|”). Now, I could do this with vim, but that wouldn’t be as nice as having a non-interactive way that I could call upon later at a moment’s notice.
Initially I went with the “sed” command, but as it turns out, sed strips the newlines off of a line of text before processing it!
Let me break down the command I went with…
The ls -1 command will list the contents of a directory with  one item per line. This is good because the next command processes its input a line at a time.
For the next command to read the results of the previous one, the output of one must be “piped” into the input of the other. This uses something called “file descriptors” (I may do another post on what I’ve learned about them in the future). The way one chains commands together like this is simply by separating them with the pipe (“|”) character.
The awk command reads each line (technically, it reads to the next newline character) and performs some operation on it. By default it will simply print the line, or “record” according to its documentation. But seeing as I’m not interested in anything with a . in the file/folder name, I’ve added a regular expression that means awk will only print records (lines) without a dot.
That’s all very well, but we still need to change the newlines to pipes.
Awk has two variables that look interesting, “RS” and “ORS”, which stand for “Record Separator” for incoming data and “Output Record Separator” respectively.  As both are already set to the newline character by default, the only one we need to change is ORS; to make sure that the record separator is changed to a pipe character on the command’s output.
As the input ended with a newline, a simple sed command finds and replaces the last character on the line (as long as it’s a “|”) with nothing; removing the unwanted final pipe.
Finally, the result of the command is written to the file ~/output.txt with the > character.

ls -1 | awk "/^[^\.]*$/" ORS='|' | sed "s/|$//" > ~/output.txt

As with all programming, there were multiple ways to go about this. Some just used commands built into the shell and would be faster than this, but as the file I needed to convert was small, this was not a concern. This way was a lot more readable.

 

No Comments

Add your comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.