Using Perl to Generate a Table of Contents for HTML Pages
Using Perl to Generate a Table of Contents for HTML Pages
This script is designed to create a Table of Contents page for HTML documents. It reads any files listed on the command line (wildcards are OK) and searches for the HTML <Hn>
.
To index an entire directory use: perl toc.pl *.html
NOTE: The files will be read in alphabetical order which may not be the order that you need. Simply cut and paste the resulting HTML until the order is correct.
#!/usr/bin/perl -w # # To index an entire directory use: # perl toc.pl *.html # use strict; # holds the name of each file # as it is being processed. my($file); # holds the text of the heading # (from the anchor tag). my($heading); # holds the last heading level # for comparision. my($oldLevel); # holds each line of the file # as it is being processed. my($line); # used as temporary variables # to shorten script line widths my($match); my($href); # holds the name of the heading # from the anchor tag. my($name); # holds the level of the current heading. my($newLevel); # First, I open an output file and print the # beginning of the HTML that is needed. # $outputFile = "fulltoc.htm"; open(OUT, ">$outputFile"); print OUT ("<HTML><HEAD><TITLE>"); print OUT ("Detailed Table of Contents\n"); print OUT ("</TITLE></HEAD><BODY>\n"); # Now, loop through every file in the command # line looking for Headers. When found, Look # for an Anchor tag so that the NAME attribute can # be used. The NAME attribute might be different # from the actual heading. # foreach $file (sort(@ARGV)) { next if $file =~ m/^\.htm$/i; print("$file\n"); open(INP, "$file"); print OUT ("<UL>\n"); $oldLevel = 1; while (<INP>) { if (m!(<H\d>.+?</H\d>)!i) { # remove anchors from header. $line = $1; $match = '<A NAME="(.+?)">(.+?)</A>'; if ($line =~ m!$match!i) { $name = $1; $heading = $2; } else { $match = '<H\d>(.+?)</H\d>'; $line =~ m!$match!i; $name = $1; $heading = $1; } m!<H(\d)>!; $newLevel = $1; if ($oldLevel > $newLevel) { print OUT ("</UL>\n"); } if ($oldLevel < $newLevel) { print OUT ("<UL>\n"); } $oldLevel = $newLevel; my($href) = "\"$file#$name\""; print OUT ("<LI>"); print OUT ("<A HREF=$href>"); print OUT ("$heading</A>\n"); } } while ($oldLevel--) { print OUT ("</UL>\n"); } close(INP); } # End the HTML document and close the output file. # print OUT ("</BODY></HTML>"); close(OUT);