Using Perl to Generate a Table of Contents for HTML Pages

This script is designed to create a Table of Contents page for HTML documents. It reads any files listed on the command line (wildcards are OK) and searches for the HTML <Hn>.

To index an entire directory use: perl *.html

NOTE: The files will be read in alphabetical order which may not be the order that you need. Simply cut and paste the resulting HTML until the order is correct.

#!/usr/bin/perl -w
# To index an entire directory use:
#     perl *.html
use strict;

# holds the name of each file
# as it is being processed.

# holds the text of the heading
# (from the anchor tag).

# holds the last heading level
# for comparision.

# holds each line of the file
# as it is being processed.

# used as temporary variables
# to shorten script line widths

# holds the name of the heading
# from the anchor tag.

# holds the level of the current heading.

# First, I open an output file and print the
# beginning of the HTML that is needed.
$outputFile = "fulltoc.htm";
open(OUT, ">$outputFile");
print OUT ("<HTML><HEAD><TITLE>");
print OUT ("Detailed Table of Contents\n");
print OUT ("</TITLE></HEAD><BODY>\n");

# Now, loop through every file in the command
# line looking for Headers. When found, Look
# for an Anchor tag so that the NAME attribute can
# be used. The NAME attribute might be different
# from the actual heading.
foreach $file (sort(@ARGV)) {
    next if $file =~ m/^\.htm$/i;
    open(INP, "$file");
    print OUT ("<UL>\n");
    $oldLevel = 1;
    while (<INP>) {
        if (m!(<H\d>.+?</H\d>)!i) {
            # remove anchors from header.
            $line = $1;
            $match = '<A NAME="(.+?)">(.+?)</A>';
            if ($line =~ m!$match!i) {
                $name = $1;
                $heading = $2;
            else {
                $match = '<H\d>(.+?)</H\d>';
                $line =~ m!$match!i;
                $name = $1;
                $heading = $1;
            $newLevel = $1;
            if ($oldLevel > $newLevel) {
                print OUT ("</UL>\n");
            if ($oldLevel < $newLevel) {
                print OUT ("<UL>\n");
            $oldLevel = $newLevel;
            my($href) = "\"$file#$name\"";
            print OUT ("<LI>");
            print OUT ("<A HREF=$href>");
            print OUT ("$heading</A>\n");
    while ($oldLevel--) {
        print OUT ("</UL>\n");

# End the HTML document and close the output file.
print OUT ("</BODY></HTML>");