Learning Perl - part 2

Finding Lines Of A File Which Contain a Certain Word

The following script finds the line(s) containing a specific word, from a file. More specifically, it finds the entry for a given user (cmrice) in the password file, and prints out that entry. If the user does not exist, it prints nothing, which may be good, or it might be bad.

#!/usr/bin/perl

#ex2.pl

open (IN,"/etc/passwd");

while ($line = <IN>)

{

$result = grep(m#cmrice#,$line);

if ($result == 1)

{

print "$line\n";

}

close IN;

Although this script does what it is supposed to do, it actually does a little more. If you add extra characters to what is being printed, it is easier to see exactly what is happening. This is a useful technique for finding out what is happening when the output may contain "whitespace" (spaces, tabs or other characters that look "white" on a printed page). Change the print statement to the following and rerun the script.

print "[$line]\n";

Does the output look like what you expected? The "]" character should be on a separate line from the rest of the data. This is because each line of a text file has an "end-of-line" character or "carriage return" at the end. Perl was written to deal with any type of file, not just text files, so it does not automatically discard this final character. If the data being read by a script is "text" then the script should strip off the "carriage return." If it does not, strange things (like the output above) can happen. Another example of this "strange behavior" might occur if you are comparing a word from the middle of a line with a word from the end of a line. If the "carriage return" has not been stripped off, although the two words look the same, they will fail to match if compared in something like an if statement.

The Chop Command

The command that will remove this extra character is the chop command. (There is also a chomp command that is supposed to only remove the character if it is a "carriage return" but there may be portability problems with it.) The following has an additional line (line 6) that solves this problem.

#!/usr/bin/perl

#ex3.pl

open (IN,"/etc/passwd");

while ($line = <IN>)

{

chop($line);

$result = grep(m#cmrice#,$line);

if ($result == 1)

{

print "$line\n";

}

close IN;

Breaking the Input Into Parts

So far, this script does nothing but a standard grep. A more useful script might be one that returns the user’s real name. The way that this might be implemented in a shell script would be to pass the line from the password file to cut or awk, which would pull out the appropriate field. In perl, the command that us used to break up a line of data is split. The split command requires a character (or characters) on which to break the input, and some kind of input to break up. Since the split command is going to return more than one value, an array is a good place to store the output. The following script has an additional line (line 10) which breaks up $line at each colon (throwing each colon away) and stores the results in an array named parts. (Array names start with an @ such as @parts but individual elements start with a $ -- $parts[4].) Also note that the array counter starts at zero (0), so the first element is stored in $parts[0], and the second is stored in $parts[1].

#!/usr/bin/perl

#ex4.pl

open (IN,"/etc/passwd");

while ($line = <IN>)

{

chop($line);

$result = grep(m#cmrice#,$line);

if ($result == 1)

{

@parts = split(m#:#,$line);

print "$parts[4]\n";

}

close IN;

Using Command Line Arguments

This script would be more useful if it could find the real name associated with any login, not just cmrice. One of the ways to do this would be to have the program accept what is called a command line argument." What this means is that when the script is run, it expects the script name to follow the command, such as "ex5.pl cmrice". When a script expects command line arguments, it is good practice for it to print a usage statement if it does not get any. This is a good way to tell the user how to use the script.

In perl, the command line arguments are stored in an array named @ARGV. One way to determine how many items are in an array is to check $#<array-name> so if we want to check how many command line arguments were given to a script we would check $#ARGV. The following script uses a command line argument: lines 3-11 check for, and save the argument into a variable named $login, while line 16 has been changed to use $login instead of the static string "cmrice."

#!/usr/bin/perl

#ex5.pl

if ($#ARGV == -1)

{

print "USAGE: $0 <userID>\n";

exit;

}

else

{

$login = $ARGV[0];

}

open (IN,"/etc/passwd");

while ($line = <IN>)

{

chop($line);

$result = grep(m#$login#,$line);

if ($result == 1)

{

@parts = split(m#:#,$line);

print "$parts[4]\n";

}

close IN;

Writing Files and Associative Arrays

Here’s an example that demonstrates how to create an output file, and how to use associative arrays. A normal array uses numbers (as in $parts[4]) while an associative array can use strings to access te data stored in it. This version of the script reads in the entire password file, builing an associative array, where the array index is the loginID, and the value is the real name. This version of the script also takes a command line argument, and uses it to find the real name. This is done by using the input as an index into the associative array. This script also creates an output file, instead of printing the answer to the screen.

#!/usr/bin/perl

#ex6.pl

if ($#ARGV == -1)

{

print "USAGE: $0 <userID>\n";

exit;

}

else

{

$login = $ARGV[0];

}

open (IN,"/etc/passwd");

while ($line = <IN>)

{

chop($line);

@parts = split(m#:#,$line);

$userinfo{$parts[0]} = $parts[4]\n";

}

close IN;

open(OUT,">data");

{

print OUT "$userinfo{$login)\n";

}

close OUT;

Syntax:

end lines with a semi-colon ";"

use curly braces "{" and "}"around code segments

$_ is the default variable used by many functions (chop, split, print, etc)

Variables:

most variables start with a "$"

lists start with an "@"

@chars=split(//);

"associative" arrays use curly braces - associative arrays use strings as their indexes into an array

$dev = "c0t0d0s6";

$tfilesys{$dev} = "dump";

concatenating strings with the "." operator

$fname="/".$mname."/home/data/sybase.devices";

Conditionals:

formatting includes parentheses around the condition, curly braces around the "action"

if ($mname eq "limbsbk1")

{

$omach = "bk3";

}

elsif ($mname eq "limbsbk2")

{

$omach = "bk4";

}

numeric	string	meaning
==	eq	equal to
!=	ne	not equal to
>	gt	greater than
>=	ge	greater than or equal to
<	lt	less than
<=	le	less than or equal to

Opening files:

for reading:

a static filename

open (IN,"/tmp/vfstab.other");

a variable filename

$fname="/".$mname."/home/data/sybase.devices";

open (IN,"$fname");

reading the output of an external command (note the "|" at the end)

open(IN,"uname|");

reading the output of an external command with a variable in it

open (IN,"/usr/bin/remsh $omach /usr/sbin/swap -l |");

for writing (note the ">" at the beginning)

open (OUT,">/tmp/slices2.dat");

Closing files

close IN;

close OUT;

Reading From Files

the default method, stores the line that is read in $_

while(<IN>)

{

chop;

$mname = $_;

}

I prefer to read the line into a variable with a name of my choosing

while ($line=<IN>)

{

chop($line);

}

The chop command removes the newline from the end of the input.

Writing to files:

The standard printf commands are used, with the filehandle

print using the "default" formatting

printf OUT "$device $save $line\n";

print using specified formats

printf OUT "%8s %3d %30s\n", $device, $save, $line;

Split:

The split command breaks a "string" into parts, making the breaks at the specified character or characters. Those familiar with the unix cut command will see many similarities. Regular expressions are useful for more complicated definitions of where to make the break.

The format of the statement is:

(<part1>,<part2> ... <partN>)=split(<what to split on>,<what to break into parts>):

break $device into 4 parts, making the break at a "/"

($j5,$j6,$j7,$dev) = split(/\//,$device);

break $line into 5 parts, making the break at one or more spaces ([ ]+ is a regular expression)

($j,$dbname,$dbsize,$device,$lname,$devsize)=split(/[ ]+/,$line);

break $line into 3 parts, making the break ar one or more spaces or tabs (\t = a tab)

($j1,$j2,$filesys)=split(/[ \t]+/,$line);

break $g2 into two parts, making the break at either a space or a colon

($junk,$num)=split(/[ \:]/,$g2);

#!/usr/bin/perl

#get_slice_info.pl.4

#2345678901234567890123456789012345678901234567890123456789012345678901

#PT run an external command and

# process the output of that

# command by opening a file with

# the "filename" being the

# external command ending with a

# pipe - perl will then read

# from the pipe

#PT by default each line of input

# is stored in a variable

# named $_

# get the machine name

open(IN,"uname|");

while(<IN>)

{

chop;

$mname = $_;

}

close IN;

#PT perl does not have a "case" or

# "switch" statement so just use

# a bunch of if statements

# determine the "other" machines name

if ($mname eq "limbsbk1")

{

$omach = "bk3";

}

elsif ($mname eq "limbsbk2")

{

$omach = "bk4";

}

elsif ($mname eq "limbsbk3")

{

$omach = "bk1";

}

elsif ($mname eq "limbsbk4")

{

$omach = "bk2";

}

#PT the dot operator concatenates

# variables to build more complex

# strings

# build the filename, based on the machine name

$fname="/".$mname."/home/data/sybase.devices";

open (IN,"$fname");

# read a line into the variable $line

while ($line=<IN>)

{

#PT remove the carriage return at

# the end of the line

chop($line);

#PT the split command breaks the

# input into parts, making the

# break at the specified

# character or characters

($j,$dbname,$dbsize,$device,$lname,$devsize)=split(/[ ]+/,$line);

if (($j eq "") && ($dbname ne "") && ($dbsize ne ""))

{

($j5,$j6,$j7,$dev) = split(/\//,$device);

if ($j6 eq "dev")

{

if ($dbname ne "NULL")

{

#PT perl has "associative" arrays

# like awk. instead of using

# a number as an index into the

# array, you use a string

# $array{"c0t0d0s0"}="boot"

$devices{$dev} = $lname;

$sizes{$dev} = $devsize;

$usage{$dev} = $dbsize;

#print "[$dev] [$devices{$dev}] [$usage{$dev}] [$sizes{$dev}]\n";

}

else

{

$devices{$dev} = $lname;

}

# print out the whole array (in sorted order)

#foreach (sort keys %devices)

# print "[$_]=[$devices{$_}]\n";

###

#PT to run an external command

# use system

# get information about remote filesystems

system("/usr/bin/rcp $omach:/etc/vfstab /tmp/vfstab.other");

#PT to open a file with a "static"

# name for reading,

# just open it with the filename

# in quotes

open (IN,"/tmp/vfstab.other");

while ($line=<IN>)

{

chop($line);

($j1,$j2,$filesys)=split(/[ \t]+/,$line);

($j3,$j4,$j5,$dev)=split(/\//,$j1);

if ($j5 eq "dsk")

{

$ofilesys{$dev} = $filesys;

}

close IN;

#PT run an external command and

# process the output of that

# command by opening a file with

# the "filename" being the

# external command ending with a

# pipe - perl will then read

# from the pipe

# get information about remote swap space

open (IN,"/usr/bin/remsh $omach /usr/sbin/swap -l |");

while ($line=<IN>)

{

chop($line);

($j1)=split(/[ \t]/,$line);

($j3,$j4,$j5,$dev)=split(/\//,$j1);

if ($j5 eq "dsk")

{

$ofilesys{$dev} = "swap";

}

# get information about remote dump space

open (IN,"/usr/bin/remsh $omach /usr/sbin/fdump -l |");

while ($line=<IN>)

{

chop($line);

($j1)=split(/[ \t]/,$line);

($j3,$j4,$j5,$dev)=split(/\//,$j1);

if ($j5 eq "dsk")

{

$ofilesys{$dev} = "dump";

}

#foreach (sort keys %ofilesys)

# print "[$_]=[$ofilesys{$_}]\n";

# get information about local filesystems

open (IN,"/etc/vfstab");

while ($line=<IN>)

{

chop($line);

($j1,$j2,$filesys)=split(/[ \t]+/,$line);

($j3,$j4,$j5,$dev)=split(/\//,$j1);

if ($j5 eq "dsk")

{

$tfilesys{$dev} = $filesys;

}

close IN;

# set up info about secondary boot device

$tfilesys{"c0t0d0s1"} = "/";

$tfilesys{"c1t1d0s1"} = "/ - secondary";

$tfilesys{"c1t1d0s3"} = "/usr - secondary";

$tfilesys{"c1t1d0s4"} = "/var - secondary";

$tfilesys{"c1t1d0s8"} = "/stand - secondary";

$tfilesys{"c1t1d0s9"} = "/opt - secondary";

$tfilesys{"c1t1d0sa"} = "/tmp - secondary";

$tfilesys{"c1t1d0sb"} = "/home - secondary";

# get information about local swap space

open (IN,"/usr/sbin/swap -l |");

while ($line=<IN>)

{

chop($line);

($j1)=split(/[ \t]/,$line);

($j3,$j4,$j5,$dev)=split(/\//,$j1);

if ($j5 eq "dsk")

{

$tfilesys{$dev} = "swap";

}

# get information about local dump space

open (IN,"/usr/sbin/fdump -l |");

while ($line=<IN>)

{

chop($line);

($j1)=split(/[ \t]/,$line);

($j3,$j4,$j5,$dev)=split(/\//,$j1);

if ($j5 eq "dsk")

{

$tfilesys{$dev} = "dump";

}

#foreach (sort keys %tfilesys)

# print "[$_]=[$tfilesys{$_}]\n";

#####################################################################

# done collecting filesystem information

#####################################################################

###

# The get_slice_info.sh script calls "/usr/alarm/bin/scsiot" which

# generates a list of devices, and tells if the device is "locked"

# by the local machine, the remote machine or neither.

# The sript then calls "/sbin/prtvtoc" for each device not locked by

# the remote machine, and puts the results in /tmp/slices.dat.

# the output file from get_slice_info.sh has two lines for each slice

# we only want one line, so join them

#PT to open a file for writing, put

# a ">" in front of the filename

open (OUT,">/tmp/slices2.dat");

open (IN,"/tmp/slices.dat");

while ($line=<IN>)

{

chop($line);

# break the line into three pieces, making the "breaks" at a tab character

($p1,$p2,$p3)=split(/ /,$line);

# if there is only one "part" then this must be the name of the device

if ($p2 eq "")

{

$device = $p1;

}

else

{

# break the first "part" into "words" making the "breaks" at a space

($w1)=split(/ /,$p1);

# if the first word is 'slice' then this is the first line of a pair

# don't print it, save it (or we can print it without a '\n' in the

# print statement)

if ($w1 eq "slice")

{

$save=$line;

}

# if the first word is not 'slice' then this is the second line

# print the previous "saved" line, and this one, separated by tabs

else

{

printf OUT "$device $save $line\n";

}

close IN;

close OUT;

#####################################################################

## done putting the vtoc info into "one line per device" format

#####################################################################

# now the lines are joined - pull out the appropriate parts

$filesys = "avail";

$percent="100%";

open (IN,"/tmp/slices2.dat");

while ($line=<IN>)

{

chop($line);

# split the line into "groups", making the break at a tab character

{

($g1,$g2,$g3,$g4,$g5,$g6,$g7,$g8,$g9,$g10)=split(/[\t]+/,$line);

# print "[$g1] [$g2] [$g3] [$g4] [$g5] [$g6] [$g7] [$g8] [$g9] [$g10]\n";

}

#split the first "group" saving the part before the first 's'

($base)=split(/s/,$g1);

# split the second "group", making the break at a space

($junk,$num)=split(/[ \:]/,$g2);

# convert to hexadecimal

# prtvtoc returns "decimal" numbers while device files use "hex"

if ( $num < 10)

{

$char = $num;

}

elsif ($num == 10)

{

$char = "a";

}

elsif ($num == 11)

{

$char = "b";

}

elsif ($num == 12)

{

$char = "c";

}

elsif ($num == 13)

{

$char = "d";

}

elsif ($num == 14)

{

$char = "e";

}

elsif ($num == 15)

{

$char = "f";

}

# split the 7th group", making the break at a space, saving the first

# part as the size

($size)=split(/ /,$g7);

#PT the dot operator concatenates

# variables to build more complex

# strings

# "build" the filename by concatenating (with the 'dot' operator)

$dev=$base."s".$char;

$source="/dev/rdsk/".$dev;

#PT the stat command returns a list

# of items associated with a file

# including the owner's uid

($devi,$ino,$nmode,$nlink,$uid,$gid,$rdev,$junk)=stat($source);

# the uid and gid are numeric - convert to text

if ($uid == 110)

{

$owner="sybase";

}

else

{

$owner="root";

}

#PT run an external command and

# process the output of that

# command by opening a file with

# the "filename" being the

# external command ending with a

# pipe - perl will then read

# from the pipe

# is the slice a filesystem?

# determine this by looking at the array of filesystems

$oflag = 0;

$tflag = 0;

if ($ofilesys{$dev} ne "")

{

$filesys = $ofilesys{$dev};

$oflag = 1;

}

if ($tfilesys{$dev} ne "")

{

$filesys = $tfilesys{$dev};

$tflag = 1;

}

if (($oflag == 1) && ($tflag == 1))

{

if ($ofilesys{$dev} ne $tfilesys{$dev})

{

print "duplicate: $dev local:$tfilesys{$dev} remote:$ofilesys{$dev}\n";

}

# did we find a filesystem name?

if ($filesys ne "avail")

{

#PT run an external command and

# process the output of that

# command by opening a file with

# the "filename" being the

# external command ending with a

# pipe - perl will then read

# from the pipe

# it is a filesystem - how full is it?

# determine this by running the dfspace command and grepping

# throught its output for the filesystem name

($fsys,$sec)=split(/ - /,$filesys);

if ($sec eq "")

{

$sec = $fsys;

}

$percent = 0;

open(FILE,"dfspace | grep $fsys |");

while ($next=<FILE>)

{

chop($next);

if ($next ne "")

{

($fs,$f2)=split(/ /,$next);

# if the filesystem from dfspace exactly matches $fsys

if ($fs eq $fsys)

{

# break the input into two parts, making the break at a '('

($junk,$pct)=split(/\(/,$next);

# break $pct into two parts, making the break at a ')'

($percent,$junk)=split(/\)/,$pct);

}

$mtotal=(($size / 2) / 1024);

if ($owner eq "sybase")

{

$percent="????";

$filesys=$devices{$dev};

if ($sizes{$dev} != "")

{

$size = $sizes{$dev};

$sused=(($usage{$dev} / $sizes{$dev}) * 100);

$savail=(100 - $sused);

$availability=sprintf "(%4d pcnt avail)",$savail;

}

else

{

$sused=-1;

$availability = "( no information)";

}

#print "XX [$dev] [$devices{$dev}] [$usage{$dev}] [$sizes{$dev}] [$sused] [$savail]\n";

}

else

{

$availability="(".$percent." available)";

($value)=split(/%/,$percent);

$mtotal=(($size / 2) /1024);

$mavail=($mtotal * ($value / 100));

$availability="(".$mavail." Megs Avail)";

$availability=sprintf "(%4d Megs Avail)",$mavail;

}

if (($num != 7) && ($num != 0))

{

printf "%-12s %4d Megs Total %-18s %-6s %-28s\n", $dev,$mtotal,$availability,$owner,$filesys;

}

$filesys="avail";

$percent="100%";

}

close IN;

PREV Perl part 1 NEXT Perl part 2