03/01/2003: Using Perl to Generate a Table of Contents for HTML Pages
Using Perl to Generate a Table of Contents for HTML Pages
This script is designed to create a Table of Contents page for HTML documents. It reads any files listed on the command line (wildcards are OK) and searches for the HTML <Hn>
.
To index an entire directory use: perl toc.pl *.html
NOTE: The files will be read in alphabetical order which may not be the order that you need. Simply cut and paste the resulting HTML until the order is correct.
#!/usr/bin/perl -w # # To index an entire directory use: # perl toc.pl *.html # use strict; # holds the name of each file # as it is being processed. my($file); # holds the text of the heading # (from the anchor tag). my($heading); # holds the last heading level # for comparision. my($oldLevel); # holds each line of the file # as it is being processed. my($line); # used as temporary variables # to shorten script line widths my($match); my($href); # holds the name of the heading # from the anchor tag. my($name); # holds the level of the current heading. my($newLevel); # First, I open an output file and print the # beginning of the HTML that is needed. # $outputFile = "fulltoc.htm"; open(OUT, ">$outputFile"); print OUT ("<HTML><HEAD><TITLE>"); print OUT ("Detailed Table of Contents\n"); print OUT ("</TITLE></HEAD><BODY>\n"); # Now, loop through every file in the command # line looking for Headers. When found, Look # for an Anchor tag so that the NAME attribute can # be used. The NAME attribute might be different # from the actual heading. # foreach $file (sort(@ARGV)) { next if $file =~ m/^\.htm$/i; print("$file\n"); open(INP, "$file"); print OUT ("<UL>\n"); $oldLevel = 1; while (<INP>) { if (m!(<H\d>.+?</H\d>)!i) { # remove anchors from header. $line = $1; $match = '<A NAME="(.+?)">(.+?)</A>'; if ($line =~ m!$match!i) { $name = $1; $heading = $2; } else { $match = '<H\d>(.+?)</H\d>'; $line =~ m!$match!i; $name = $1; $heading = $1; } m!<H(\d)>!; $newLevel = $1; if ($oldLevel > $newLevel) { print OUT ("</UL>\n"); } if ($oldLevel < $newLevel) { print OUT ("<UL>\n"); } $oldLevel = $newLevel; my($href) = "\"$file#$name\""; print OUT ("<LI>"); print OUT ("<A HREF=$href>"); print OUT ("$heading</A>\n"); } } while ($oldLevel--) { print OUT ("</UL>\n"); } close(INP); } # End the HTML document and close the output file. # print OUT ("</BODY></HTML>"); close(OUT);
03/01/2003: How to Reduce HTML Size Yet Still Use Long Style Names Using ColdFusion
How to Reduce HTML Size Yet Still Use Long Style Names Using ColdFusion
Here's a suggestion for dealing with styles donated by Steve Runyon.
Long style names, like "tabCellSelected" or "tabCellUnselected", cause HTML page sizes to grow. For example, if you have a table with 8 columns and 20 rows, with 1 column selected, you're expending 2680 bytes on the names of the styles. ((7 * 20 * 17) + (1 * 20 * 15) = 2680)
The page size can be reduced by creating (application?) variables named like the style, whose values are short placeholders:
<cfset application.TabCellSelected_sty = "s001"> <cfset application.TabCellUnselected_sty = "s002">
Your style definitions then look like this:
TD.#application.TabCellSelected_sty# { [etc] } TD.#application.TabCellUnselected_sty# { [etc] }
And your generated html looks like this:
<td class=s001>data</td> <td class=s002>data</td>
On that same 8x20 table, you're now using only 640 bytes for the styles. (8 * 20 * 4)