File/nasmail/functions/htmlfilter.php

Description

HTML filtering routines

This script contains modified version of HTML filtering library written by Konstantin Riabitsev in Duke University. Library is licensed under LGPL. It was updated by SquirrelMail developers to counter XSS issues unsanitized by original library.

This script contains modifications ported from SquirrelMail 1.4.9+. See commits tagged with SM-PATCH keyword. Modifications are copyrighted by the SquirrelMail Project Team. Copyright (c) 2007 The SquirrelMail Project Team

Functions
magicHTML (line 1285)

This is a wrapper function to call html sanitizing routines.

  • return: string with html safe to display in the browser.
a magicHTML (string $body, int $id, object Message $message, [string $mailbox = 'INBOX'])
  • string $body: body the body of the message
  • int $id: id the id of the message
  • object Message $message: message NaSMail Message object
  • string $mailbox: mailbox current mailbox
sq_body2div (line 1078)

This function changes the <body> tag into a <div> tag since we can't really have a body-within-body.

  • return: a modified array of attributes to be set for <div>
array sq_body2div (array $attary, string $mailbox, object Message $message, int $id)
  • array $attary: attary an array of attributes and values of <body>
  • string $mailbox: mailbox mailbox we're currently reading (for cid2http)
  • object Message $message: message current message (for cid2http)
  • int $id: id current message id (for cid2http)
sq_casenormalize (line 227)

A small helper function to use with array_walk. Modifies a by-ref value and makes it lowercase.

  • return: since it modifies a by-ref value.
void sq_casenormalize (string &$val)
  • string &$val: val a value passed by-ref.
sq_cid2http (line 1013)

This function converts cid: url's into the ones that can be viewed in the browser.

  • return: a string with a http-friendly url
string sq_cid2http (object Message $message, int $id, string $cidurl, string $mailbox)
  • object Message $message: message the message object
  • int $id: id the message id
  • string $cidurl: cidurl the cid: url.
  • string $mailbox: mailbox the message mailbox
sq_deent (line 592)

Translates entities into literal values so they can be checked.

  • return: True or False depending on whether there were matches.
boolean sq_deent (string &$attvalue, string $regex, [boolean $hex = false])
  • string &$attvalue: attvalue the by-ref value to check.
  • string $regex: regex the regular expression to check against.
  • boolean $hex: hex whether the entites are hexadecimal.
sq_defang (line 32)

This function checks attribute values for entity-encoded values and returns them translated into 8-bit strings so we can run checks on them.

  • return: Nothing, modifies a reference value.
void sq_defang (string &$attvalue)
  • string &$attvalue: attvalue A string to run entity check against.
sq_findnxreg (line 282)

This function takes a PCRE-style regexp and tries to match it within the string.

  • return: Returns a false if no matches found, or an array with the following members:
    • integer with the location of the match within $body
    • string with whatever content between offset and the match
    • string with whatever it is we matched
boolean|array sq_findnxreg (string $body, int $offset, string $reg)
  • string $body: body The string to look for needle in.
  • int $offset: offset Start looking from here.
  • string $reg: reg A PCRE-style regex to match.
sq_findnxstr (line 261)

This function looks for the next character within a string. It's really just a glorified "strpos", except it catches if failures nicely.

  • return: location of the next occurance of the needle, or strlen($body) if needle wasn't found.
int sq_findnxstr (string $body, int $offset, string $needle)
  • string $body: body The string to look for needle in.
  • int $offset: offset Start looking from this position.
  • string $needle: needle The character/string to look for.
sq_fixatts (line 622)

This function runs various checks against the attributes.

  • return: Array with modified attributes.
array sq_fixatts (string $tagname, array $attary, array $rm_attnames, array $bad_attvals, array $add_attr_to_tag, object Message $message, int $id,  $mailbox)
  • string $tagname: tagname String with the name of the tag.
  • array $attary: attary Array with all tag attributes.
  • array $rm_attnames: rm_attnames See description for sq_sanitize
  • array $bad_attvals: bad_attvals See description for sq_sanitize
  • array $add_attr_to_tag: add_attr_to_tag See description for sq_sanitize
  • object Message $message: message message object
  • int $id: id message id
  • $mailbox
sq_fixIE_idiocy (line 76)

Fixes sanitizing issues caused by abuse of W3.org specifications in Microsoft Internet Explorer

Translates dangerous Unicode characters which are accepted by IE as regular characters. Original SquirrelMail function is modified to NaSMail layout.

  • return: Nothing, modifies a reference value.
  • author: Marc Groot Koerkamp.
void sq_fixIE_idiocy (string &$attvalue)
  • string &$attvalue: attvalue The attribute value before dangerous characters are translated.
sq_fixstyle (line 869)

This function edits the style definition to make them friendly and usable in NaSMail.

  • return: a string with edited content.
string sq_fixstyle (string $body, int $pos, object Message $message, int $id, string $mailbox)
  • string $body: body message text
  • int $pos: pos current position
  • object Message $message: message NaSMail Message object
  • int $id: id the message id
  • string $mailbox: mailbox the message mailbox
sq_fix_url (line 747)

This function filters url's

  • since: 1.1 (sm 1.4.10)
void sq_fix_url (string $attname, string &$attvalue, object Message $message, int $id, string $mailbox, [string $sQuote = '&quot;'])
  • string $attname: attname Attribute name
  • string &$attvalue: attvalue String with attribute value to filter
  • object Message $message: message message object
  • int $id: id message id
  • string $mailbox: mailbox mailbox
  • string $sQuote: sQuote quoting characters around url's
sq_getnxtag (line 310)

This function looks for the next tag.

  • return: false if no more tags exist in the body, or an array with the following members:
    • string with the name of the tag
    • array with attributes and their values
    • integer with tag type (1, 2, or 3)
    • integer where the tag starts (starting "<")
    • integer where the tag ends (ending ">")
    first three members will be false, if the tag is invalid.
boolean|array sq_getnxtag (string $body, int $offset)
  • string $body: body String where to look for the next tag.
  • int $offset: offset Start looking from here.
sq_sanitize (line 1126)

This is the main function and the one you should actually be calling.

There are several variables you should be aware of an which need special description.

Since the description is quite lengthy, see it here: http://linux.duke.edu/projects/mini/htmlfilter/ or check magicHTML() function.

  • return: html safe to show on your pages.
sanitized sq_sanitize (string $body, array $tag_list, array $rm_tags_with_content, array $self_closing_tags, boolean $force_tag_closing, array $rm_attnames, array $bad_attvals, array $add_attr_to_tag, object Message $message, int $id,  $mailbox)
  • string $body: body the string with HTML you wish to filter
  • array $tag_list: tag_list see description above
  • array $rm_tags_with_content: rm_tags_with_content see description above
  • array $self_closing_tags: self_closing_tags see description above
  • boolean $force_tag_closing: force_tag_closing see description above
  • array $rm_attnames: rm_attnames see description above
  • array $bad_attvals: bad_attvals see description above
  • array $add_attr_to_tag: add_attr_to_tag see description above
  • object Message $message: message NaSMail message object
  • int $id: id message id
  • $mailbox
sq_skipspace (line 241)

This function skips any whitespace from the current position within a string and to the next non-whitespace value.

  • return: the location within the $body where the next non-whitespace char is located.
int sq_skipspace (string $body, int $offset)
  • string $body: body the string
  • int $offset: offset the offset within the string where we should start looking for the next non-whitespace character.
sq_tagprint (line 200)

This function returns the final tag out of the tag name, an array of attributes, and the type of the tag. This function is called by sq_sanitize internally.

  • return: a string with the final tag representation.
string sq_tagprint (string $tagname, array $attary, int $tagtype)
  • string $tagname: tagname the name of the tag.
  • array $attary: attary the array of attributes and their values
  • int $tagtype: tagtype The type of the tag.
    1. (default) - Opening tag, e.g.:
    <a href="blah"> 2 - Closing tag, e.g.: </a> 3 - XHTML-style content-less tag, e.g.: <img src="blah" />
sq_unspace (line 58)

Kill any tabs, newlines, or carriage returns. Our friends the makers of the browser with 95% market value decided that it'd be funny to make "java[tab]script" be just as good as "javascript".

  • return: Nothing, modifies a reference value.
void sq_unspace (string &$attvalue)
  • string &$attvalue: attvalue The attribute value before extraneous spaces removed.

Documentation generated on Sun, 22 Nov 2009 17:36:37 +0200 by phpDocumentor 1.4.3