Perl script to print out TITLE lines

Charles Cave (cmc@sydney.unidata.oz.au)
Tue, 30 May 1995 04:06:30 +0500



Problem:

I wanted to produce a listing of TITLE lines for all
HTML files in my current directory.

Solution:

A perl script to do the job, called htmrpt

#!/usr/local/bin/perl
#
# give a report of the TITLE lines from all *.htm files

foreach $file (<*.htm*>) {
printf "%-20s ", $file;
open(HTFILE, $file);
while (<HTFILE>) {
chop;
$notitle = 1;
if (/^\<TITLE\>/) {
s/<TITLE>//;
s/<\/TITLE>//;
print;
print "\n";
$notitle = 0;
last; # break out of this loop (file)
}
if (/^\<title\>/) {
s/<title>//; # There must be an easier way to
s/<\/title>//; # solve this almost duplicate block of code!
print; # I am just a perl newbie
print "\n";
$notitle = 0;
last; # break out of this loop (file)
}
}
if ($notitle == 1) { print "*** No title found ***\n" };
close(HTFILE);
}

The following script htmrpt2.pl prints the Title Line first
so the script should be piped into sort for an alphabetical
listing by Title

#!/usr/local/bin/perl

foreach $file (<*.htm*>) {
open(HTFILE, $file);
while (<HTFILE>) {
chop;
$notitle = 1;
if (/^\<TITLE\>/) {
s/<TITLE>//;
s/<\/TITLE>//;
print;
$notitle = 0;
last; # break out of this loop (file)
}
if (/^\<title\>/) {
s/<title>//;
s/<\/title>//;
print;
$notitle = 0;
last; # break out of this loop (file)
}
}
if ($notitle == 1) { print "*** No title found *** " };
close(HTFILE);
print " ($file)\n";
}

---------------------------------------------------------------------------
Charles Cave ~ .-_|\ Phone +61 2 416 6877
Customer Services Manager / \ ~ Fax +61 2 416 2086
Unidata Australasia \.--._*<--- Level 2, 280 Pacific Hwy
cmc@sydney.unidata.oz.au ~ v Lindfield, NSW, 2070 AUSTRALIA
---------------------------------------------------------------------------