[Zenoss-dev] Fwd: Fix to BeautifulSoup link parsing
Duncan McGreggor
duncan at zenoss.com
Thu Nov 2 12:31:14 EST 2006
In reference to that bug I found in twill the other day (and
mentioned to Erik briefly):
Titus responded in another email and then sent this on to the
mechanize folks. Looks like it's an issue in mechanize and Titus
already has a patch for it.
Begin forwarded message:
> From: Titus Brown <titus at caltech.edu>
> Date: November 1, 2006 11:14:00 PM MST
> To: wwwsearch-general at lists.sf.net
> Cc: Duncan McGreggor <duncan at zenoss.com>
> Subject: Fix to BeautifulSoup link parsing
> Reply-To: titus at idyll.org
>
> Hi folks,
>
> Duncan McGreggor found a nice little bug in mechanize that manifested
> differently in twill and zope.testbrowser.
>
> Briefly, parsing links of the form
>
> <a href="link">
> <span>some text</span></a>
>
> fails in mechanize currently. The fix is to use 'link.fetchText'
> instead of 'link.firstText' in mechanize/_html.py, line 382. What
> appears to be happening is that 'firstText' just grabs the newline
> before <span>, without also grabbing 'some text'.
>
> cheers,
> --titus
>
> Patch against svn latest:
>
> Index: mechanize/_html.py
> ===================================================================
> --- mechanize/_html.py (revision 34052)
> +++ mechanize/_html.py (working copy)
> @@ -378,7 +378,8 @@
> if not url:
> continue
> url = clean_url(url, encoding)
> - text = link.firstText(lambda t: True)
> + text = link.fetchText(lambda t: True)
> + text = " ".join(text)
> if text is _beautifulsoup.Null:
> # follow _pullparser's weird behaviour rigidly
> if link.name == "a":
More information about the zenoss-dev
mailing list