HtmlZap Documentation

Most programs that use the HtmlZap control will probably follow this basic algorithm:

  • Load an HTML file into the control using the Load or LoadBuffer commands. Assuming the load operation is successful, the control's properties will immediately reflect the first HTML tag or "slice" of text in the file.
  • If the IsTag property is True, the current slice is an HTML tag. Check the TagName property to find out which tag, then process the tag's parameters using the Param(name), PName(index), and PValue(index) properties as needed.
  • If the IsTag property is False, the current slice is text "between" tags, which you can retrieve from the control's Text property. Process this text as needed.
  • Use the Next method to advance to the next "slice." Loop until the control's EOF property becomes True, indicating that you've processed the whole file.
  • Prepare the control for another file with the Reset method.

HtmlZap Properties and Methods

Properties Methods

 

 
For the purposes of the sample code in this document, the object name "HZ" refers to an instance of the HtmlZap control.
HtmlZap properties and methods are only useful at runtime.

Property ClosedTag

  • Boolean
  • Read-only

This property will be true when the current tag is a "singleton" tag as found in XML and XHMTL, e.g., <img ... />. You may need to use this property if your code distinguishes between "open" and "close" tags, since a "closed tag" acts as both.

Property CompressWS

  • Boolean
  • Default = False

If you set this property to True, HtmlZap will compress runs of multiple whitespace characters (tabs, spaces, carriage returns, etc.) to a single space. This is most useful if you're interpreting the HTML for display; browsers do the same thing. It's best to leave this property set to False if you're planning on writing out modified HTML code: the extra whitespace will make the resulting file much more legible.

Property EOF

  • Boolean
  • Read-only

This boolean value changes to True when the last slice has been read from the source HTML file.

Property HdrLevel

  • Integer
  • Read-only

Indicates that the current slice is a <Hn> tag and returns the heading level n. Useful for detecting when you've entered a new "section" of an HTML file, as long as the sections are delimited by <Hn> tags. Zero for non-heading tags and text.

Property IsTag

  • Boolean
  • Read-only

Indicates that the current slice is a tag. The tag's name is available from the TagName property, and the parameters (if any) are accessible using the Param, PName, and PValue properties.

If this property is False, the current slice is text, which can be retrieved using the Text property.

Method Load

  • Usage: object.Load filename

Loads an HTML file for processing. The control executes the Reset method before loading, so any previous file and position information will be lost.

Assuming the load operation is successful, the control's properties will reflect the nature of the first "slice" in the HTML file. If the load fails, the control will throw an error. In Visual Basic, you can examine the Err variable to determine what went wrong.

Note: The HtmlZap control will keep the loaded file open until the Reset method is executed, or until the control is destroyed.

Technical Note: For those who may be concerned, HtmlZap doesn't actually read the file during the Load operation. The control uses a memory-mapped file to process the HTML, so "nothing much" actually happens while the Load method is executing. The method just sets up the mapped file and initializes the control to the first slice.

Method LoadBuffer

  • Usage: object.LoadBuffer byteArray or string

Loads HTML file for processing from a Byte Array or String. The control executes the Reset method before loading, so any previous file and position information will be lost.

Assuming the load operation is successful, the control's properties will reflect the nature of the first "slice" in the HTML file. If the load fails, the control will throw an error. In Visual Basic, you can examine the Err variable to determine what went wrong.

This method is designed for use with ActiveX Web objects. Specifically, you can use the LoadBuffer method in an ActiveX control's AsyncReadComplete event handler to begin processing HTML retrieved from a server using the AsyncRead method.

For example, if you execute a VB statement like this:

    AsyncRead "http://poit.narf.org/brain/index.htm", vbAsyncTypeByteArray

you'll need an event handler that looks something like this (assuming your HtmlZap control is called "HZ"):

    Private Sub UserControl_AsyncReadComplete(AsyncProp As AsyncProperty)
    '
    ' File read completed
    '
    Dim es As Integer

    On Error Resume Next
    HZ.LoadBuffer AsyncProp.Value
    es = Err
    On Error GoTo 0

    If es <> 0 Then            ' HTTP transfer or LoadBuffer method failed
        {error handling}
        Exit Sub
        End If

    While Not HZ.EOF

        {process the HTML}

        HZ.Next
        Wend

    HZ.Reset

    End Sub

The contents of http://poit.narf.org/brain/index.htm will arrive in the Byte Array AsyncProp.Value; the LoadBuffer method lets you use this array as input to the HtmlZap control.

Property MaxParam

  • Integer
  • Read-only

Reads the number of parameters for the current tag. Returns 0 for text slices or if the current tag has no parameters.

To display all the current tag's parameter names and values on the Debug window, you might do something like this:

    Dim n As Integer, nmax As Integer

    nmax = HZ.MaxParam - 1

    For n = 0 To nmax
        Debug.Print HZ.PName(n); "="; HZ.PValue(n)
        Next n
 

Method Next

  • Usage: object.Next

Moves on to the next slice in the HTML file. Sets all the parameters to reflect the new slice; the Position property will also change. The EOF property will change to True if this is the last slice in the file.

Property Param

  • String
  • Read-write. Usage: string = object.Param(paramNameString)

Retrieves or sets a tag parameter by its name. For example, if you were processing an <IMG> tag, you might do:

    Debug.Print "Image File is "; HZ.Param("src")

This will retrieve the <IMG> tag's src= parameter. Take a look at the PName property for an example.

You can also use the Param property to modify a tag. For example, if the current tag were:

    <a href="http://www.yahoo.com/>

and you executed the statements:

    hz.Param("href") = "http://www.google.com/"
    hz.Param("target") = "_new"

the toString property would look like this:

    a href="http://www.google.com/" target="_new"

Note that (for historical reasons) the toString property returns the tag without the enclosing <> brackets.

Property Percent

  • Integer
  • Read-only

Indicates progress through the current HTML file in percent. Good for progress bars, etc.

Property PName

  • String
  • Read-only. Usage: string = HZ.PName(indexInteger)

Retrieves a parameter name from the current tag by its index. Parameter indexes range from 0 to MaxParam - 1.

For example, for the tag:

    <img src="pinky.gif" width=100 height=200>

You'd get (HZ.MaxParam = 3):

Param
HZ.Param("src") = "pinky.gif"
HZ.Param("width") = "100"
HZ.Param("height") = "200"
PName
HZ.PName(0) = "src"
HZ.PName(1) = "width"
HZ.PName(2) = "height"
PValue
HZ.PValue(0) = "pinky.gif"
HZ.PValue(1) = "100"
HZ.PValue(2) = "200"

Property Position

  • Long Integer
  • Read/Write. Usage: positionvarLong = HZ.Position or HZ.Position = positionvarLong

Retrieves or sets position in the file in bytes from the beginning of the file. Can be used to save and restore positions if you need to jump forward or backwards in the file.

Property PValue

  • String
  • Read-only. Usage: string = HZ.PValue(indexInteger)

Retrieves a parameter value from the current tag by its index. Parameter indexes range from 0 to MaxParam - 1.

See the PName property for an example.

Method RemoveParam

  • Usage: object.RemoveParam(paramNameString)

Removes the parameter indicated by paramNameString from the current tag. After you remove a parameter you can retrieve the modified tag with the toString property.

Method Reset

  • Usage: object.Reset

Closes the current HTML file, frees memory, clears properties, and so forth. Returns the control to its initial state, ready to process another file.

Method Rewind

  • Usage: object.Rewind

Sets the control back to the beginning of the current file.

Property TagName

  • String
  • Read-only

Returns the name of the current tag (e.g., "h", "p", "img", etc.). Tag names are always converted to lower case. Returns the empty string if the current slice is text.

Important Note! Do not confuse HtmlZap's TagName property with the Tag property managed by Visual Basic. Under VB, every HtmlZap control will have both a Tag and a TagName property, but the TagName property is where the names of tags will actually appear. Sadly, ActiveX controls have no access to the Tag property (it's generated and managed by VB), or I would have used it!
 

Property Text

  • String
  • Read-only

Returns text found between tags when the current slice is text (IsTag is False). Returns the empty string if the current slice is a tag.

Property ToString

  • String
  • Read-only

Returns the current tag and all parameters in string form, without leading and trailing <>'s. Take a look at the sample source code to see how this property is used.

You can use this property to retrieve the current version of of the current tag at any time, whether you have modified it (using the Param property, for example) or not.

Property URLCanonicalize

  • String
  • Usage: string = HZ.URLCanonicalize(urlString, escapeFlag)

Converts a URL to canonical form, optionally escaping "non-URL-safe" characters (if escapeFlag is nonzero). This is just a wrapper around the OS function URLCanonicalize. It's probably most useful in scripting languages like VBScript or ASP.

For more information consult the Microsoft documentation.

Property URLCombine

  • String
  • Usage: string = HZ.URLCombine(baseURLString, relativeURLString, escapeFlag)

Combines a base and a relative URL,  optionally escaping "non-URL-safe" characters (if escapeFlag is nonzero). This is just a wrapper around the OS function URLCombine. It's probably most useful in scripting languages like VBScript or ASP.

For more information consult the Microsoft documentation.

Last revised: 26 January 2013