[Zenoss-dev] Fwd: Fix to BeautifulSoup link parsing

Duncan McGreggor duncan at zenoss.com
Thu Nov 2 12:31:14 EST 2006


In reference to that bug I found in twill the other day (and  
mentioned to Erik briefly):

Titus responded in another email and then sent this on to the  
mechanize folks. Looks like it's an issue in mechanize and Titus  
already has a patch for it.

Begin forwarded message:

> From: Titus Brown <titus at caltech.edu>
> Date: November 1, 2006 11:14:00 PM MST
> To: wwwsearch-general at lists.sf.net
> Cc: Duncan McGreggor <duncan at zenoss.com>
> Subject: Fix to BeautifulSoup link parsing
> Reply-To: titus at idyll.org
>
> Hi folks,
>
> Duncan McGreggor found a nice little bug in mechanize that manifested
> differently in twill and zope.testbrowser.
>
> Briefly, parsing links of the form
>
> <a href="link">
> <span>some text</span></a>
>
> fails in mechanize currently.  The fix is to use 'link.fetchText'
> instead of 'link.firstText' in mechanize/_html.py, line 382.  What
> appears to be happening is that 'firstText' just grabs the newline
> before <span>, without also grabbing 'some text'.
>
> cheers,
> --titus
>
> Patch against svn latest:
>
> Index: mechanize/_html.py
> ===================================================================
> --- mechanize/_html.py  (revision 34052)
> +++ mechanize/_html.py  (working copy)
> @@ -378,7 +378,8 @@
>                  if not url:
>                      continue
>                  url = clean_url(url, encoding)
> -                text = link.firstText(lambda t: True)
> +                text = link.fetchText(lambda t: True)
> +                text = " ".join(text)
>                  if text is _beautifulsoup.Null:
>                      # follow _pullparser's weird behaviour rigidly
>                      if link.name == "a":




More information about the zenoss-dev mailing list