HtmlZap ATL ActiveX Control

Download
Installing HtmlZap
HtmlZap with Web ActiveX controls
HtmlZap with Scripting Languages
Component source code
Legal stuff
Contact Information
Revision history
Archive

 

HtmlZap isn't exciting to look at. In fact, it's invisible!

However, this little control can make your life much easier if you need to parse HTML files for one reason or another. In essence, HtmlZap is a "canned" HTML parsing engine.

If you feed HtmlZap an HTML file, it will dice the text and tags faster than a Ginsu® and give you the pieces one at a time for your code to digest. The control takes on all the tedious work of stepping through the HTML code; all you need to do is act on what it finds. You can even modify the tag by setting properties and write it out again!

Since I wrote this control, I've used it to perform dozens of tedious HTML reformatting and copying tasks. It makes processing HTML so simple that you can afford to write little Visual Basic programs to solve your HTML mangling problems.

I have one (now very large) VB project that I use whenever I need to do something non-trivial to an HTML file. I just add a new button or form for each task, write a few lines of code, click the new button, and get on with life.

 

Download

Download HtmlZap HtmlZap for Windows COM component.

HtmlZap_1110.zip, 72KB, 21 January 2012

 

Installing HtmlZap

Installing HtmlZap is fairly straightforward. Begin by downloading the HtmlZap ZIP file. Once you've got the ZIP file, decompress it into a fresh directory. The only remaining step is to "register" the control; most programming languages and development environments for Windows provide a simple way of doing this. In VB 6, you select the "Project / Components" command, then click the "Browse" button on the "Controls" page of the resulting dialog box.

VB will pop up a "File Open" dialog box; navigate to the directory where you unzipped HtmlZap, then "open" htmlzap.dll. After a brief pause, "HtmlZap ATL Control" will appear in the list of available components.

To use the control in your VB application, make sure the "HtmlZap ATL Control" item is checked in the Components list, then simply draw an HtmlZap control on the form where you want to use it. Change the name to something useful, then use the control's properties and methods in your code. That's it!

Though I've concentrated on Visual Basic so far, there's no reason why you couldn't use HtmlZap controls with other languages. HtmlZap should work with any language or development that supports ActiveX (OCX) controls, including scripting languages like VBScript and ASP.

HtmlZap on a VB Form HtmlZap in the VB Tool Palette
HtmlZap (Form) HtmlZap (Tool Palette)

You may want to look at the HtmlZap help page and the sample code.

Using HtmlZap as a COM object instead of a component

While using HtmlZap as a component on your form is simple, it may be more efficient to use it as object declared locally to the method where you want to use it. You can also declare the control using a VB statement like:

Dim hz As New HTMLZAPLib.HtmlParse

Important: If you want to use HtmlZap in this way you should add it to your VB project in the "References" dialog box, not the "Components" dialog box.

 

Using HtmlZap with Web ActiveX objects

HtmlZap was built with Microsoft's ATL lightweight COM framework. That's jargon which basically means that the control is very small (64K, not bad even for a DOS app!) and that it doesn't use any of the gargantuan MFC DLLs. In particular, HtmlZap doesn't need mfc42.dll (the main Microsoft Foundation Classes DLL) or msvcrt.dll (the Microsoft C Runtime Library DLL).

Boiled down to the bottom line, this means that HtmlZap needs only the basic APIs supplied by Windows 9x and NT. You don't have to ship any "redistributable" DLLs with your HtmlZap-based application (unless, of course, your app needs them).

This is especially important if you're building ActiveX controls for the Web (somebody out there must be! <g>). If your ActiveX control needs redistributable DLLs, the people who visit your Web site are going to have to wait for those DLLs to download before your control can start. Not a good thing, since some of those DLLs are near a megabyte in size.

You can use HtmlZap to build Web ActiveX components with confidence, because it's entirely self-reliant, and it only adds 64K to your download.

 

HtmlZap and scripting languages

HtmlZap can be called from scripting languages like ASP and VBScript. Here's an example of a VBScript that prints all the hyperlinks in an HTML document to the screen:

'

' HtmlZap test

'

option explicit

 

dim hz

 

set hz = CreateObject("HtmlZap.HtmlZap.1")

 

hz.load "index.htm"

 

while not hz.eof

 

if hz.tagname = "a" then

wscript.echo hz.param("href")

end if

 

hz.next

wend

 

HtmlZap Source Code

The source code for the HtmlZap component is available below. The current version was compiled and tested on Windows 7 x64 using Microsoft Visual Studio 2010.

Using the source

If you create something new using the source supplied from this site, please remember:

  • HtmlZap is published under the GNU Public License. Commercial reuse is not free.
     
  • The HtmlZap source code is Copyright (C) 1997-2013 Michael Newcomb.
     
  • If you create a new COM component based on this source, please give it a distinctive name and appearance and use new GUIDs and IIDs. This will permit your component to coexist with the "official" HtmlZap component.
     
  • NO technical support will be provided for the source code beyond the comments in the files.

Most importantly, if you fix a bug or add an enhancement, please let me know! I'd love to incorporate your improvements into the official source!

Platform SDK

Important! If you decide to build HtmlZap from the source code, you need to have Microsoft's Platform SDK installed on your development system. The default set of libraries and header files installed with Visual Studio will not work, you'll get errors like: 'UrlCombineW' : undeclared identifier. You also need to make sure that the HtmlZap project file points to the version of ShLwAPI.lib supplied with the Platform SDK.

Download source

HtmlZap's source code is available from GitHub:

git://github.com/maiken2051/htmlzap.git

https://github.com/maiken2051/htmlzap

Legal Stuff

HtmlZap is freeware published under the GNU General Public License without any warranties of any kind whatsoever! Use at your own risk!

HtmlZap is Copyright © 1997, 2001, 2002, 2013 Michael Newcomb.

 

Contact Information

If you have any questions, comments, feature suggestions, or problems, please don't hesitate to send me mail at htmlzap@miken.com.

 

Revision History

Version 1.1.1 -- 23 January 2013

  • Added the ClosedTag property to support "XML-style" HTML, e.g., <img ... />

Version 1.1 -- 23 June 2002

  • Fixed problem that prevented some Windows 9x machines from accessing tag attributes using the Param("name") property.
     
  • Fixed bug that caused the control to drop foreign-language characters when CompressWS was turned on.

 

Archive

Previous versions of HtmlZap.

V1.1 Download HtmlZap 1.1 HtmlZap for Windows COM component.
HtmlZap_1100.zip, 30KB, 23 June 2002
V1.0 Download HtmlZap 1.0 HtmlZap for Windows COM component.
HtmlZap_1010.zip, 28KB, 30 December 2001

 

Last revised: 21 January 2013