Ignored URLs

The URLs of web documents may contain elements which change on every request, e.g. session-specific data (...&session_id=bbag2f2d82ec2&...). If a web document monitored by GooDiff contains such ever-changing URLs in its content, GooDiff will report a syntactically correct but semantically incorrect change of the monitored web document.

To counter this problem, GooDiff is configured to treat such URLs in monitored web documents in a special way: GooDiff will still include hyperlinks to such URLs but will replace the actual link with a link to this page, i.e. http://www.goodiff.org/wiki/IgnoredUrls. The hyperlink title will remain unchanged.

At a later point, we might include more intelligent code which strips the problematic elements of a URL specifically instead of ignoring the whole URL.

Example

The hyperlink

    <a href="http://www.foo.bar/ignored_url/with/session_specific_data/index.php?session_id=bbag2f2d82ec2">Monitoring legal documents</a>

in the source code of the original web document becomes

    [Monitoring legal documents](http://www.goodiff.org/wiki/IgnoredUrls)

in the GooDiff mirror.