Re: HTML -> ASCII?

Dale Dougherty (dale@ora.com)
Mon, 8 Nov 1993 23:29:32 -0800


The simplest approach is a sed script that removes HTML tags,
that is, anything between a pair of angle brackets.

s/<.[^>]*>//g

You can obviously build more complicated scripts in sed, awk or perl.
The above script will strip out link information because HREF
is an attribute inside the tag.

Such seat-of-the-pants conversions depend on how consistent the
HTML coding is. This is by no means a general solution.

-- 
Dale Dougherty (dale@ora.com) 
Publisher, Global Network Navigator, O'Reilly & Associates, Inc.
103A Morris Street, Sebastopol, California 95472 
(707) 829-3762 (home office); 1-800-998-9938