root/Web-Scraper/trunk/Changes

Revision 2346 (checked in by miyagawa, 13 years ago)

changes date

Line 
1 Revision history for Perl extension Web::Scraper
2
3 0.15  Sat Sep 15 21:28:10 PDT 2007
4         - Call env_proxy in scraper CLI
5         - Added $Web::Scraper::UserAgent and $scraper->user_agent accessor to deal
6           with UserAgent object
7         - Don't escape non-ASCII characters into &#xXXXX; in scraper shell 's' and WARN
8
9 0.14  Fri Sep 14 16:06:20 PDT 2007
10         - Fix bin/scraper to work with older Term::ReadLine.
11           (Thanks to Tina Müller [RT:29079])
12         - Now link elements like img@src and a@href are automatically
13           converted to absolute URI using the current URI as a base.
14           Only effective when you do $s->scrape(URI) or $s->scrape(\$html, URI)
15         - Added 'HTML' and its alias 'RAW' to get the HTML chunk inside the tag
16             process "script", "code" => 'RAW';
17           Handy if you want the raw HTML code inside <script> or <style>.
18           (Thanks to charsbar for the suggestion)
19
20 0.13  Sun Sep  2 17:11:08 PDT 2007
21         - Added 'c' and 'c all' command to scraper to generate the
22           code to replay the session
23         - Added 'WARN' as a shortcut to sub { warn $_->as_HTML } on scraper shell like:
24             process "a", WARN; # print 'a' elements as HTML
25         - Added 'search-cpan.pl' and 'rel-tag.pl' to eg/
26
27 0.12  Thu Aug 30 02:39:44 PDT 2007
28         - Added 's' command to scraper to get the HTML source
29         - You can use $tree variable to deal with the HTML::Element object in scraper shell
30         - Give a graceful error message if the given Selector/XPath doesn't compile
31         - Give a better error when number of args in process() seems wrong
32
33 0.11  Tue Aug 28 02:50:01 PDT 2007
34         - Supported hash-reference in process values, like
35           process "a", "people[]", { link => '@href', name => 'TEXT' };
36           See t/09-process_hash.t for its usage.
37
38 0.10  Mon Aug 27 00:53:51 PDT 2007
39         - result now returns the entire stash if called without keys
40         - added bin/scraper CLI
41
42 0.09  Wed Aug 15 10:51:14 PDT 2007
43         - remove Devel::Leak use from tests
44
45 0.08  Tue Aug 14 13:25:16 PDT 2007
46         - Call $tree->delete after the callback to avoid memory leaks by TreeBuilder.
47           (Thanks to k.daiba for the report)
48
49 0.07  Sat May 12 16:23:51 PDT 2007
50         - Updated dependencies for HTML::TreeBuilder::XPath
51
52 0.06  Sat May 12 15:47:27 PDT 2007
53         - Now don't use decoded_content to work with new H::R::Encoding
54
55 0.05  Wed May  9 18:21:22 PDT 2007
56         - Added (less DSL-ish) Web::Scraper->define(sub { ... }) syntax
57         - Fixed bug where the module dies if there's no encoding found in HTTP response headers
58         - Added more examples in eg/
59         - When we get value using callback, pass HTML::Element object as $_, in addition to $_[0]
60           (Suggested by Matt S. Trout)
61         - If the expression (1st argument to process()) starts with "/", it's
62           treated as a direct XPath and no Selector-to-XPath conversion is done.
63
64 0.04  Wed May  9 00:55:32 PDT 2007
65         - *API CHANGE* Now scraper {} returns Web::Scraper object and not closure.
66           You should call ->scrape() to get the response back.
67           (Suggested by Marcus Ramberg)
68
69           I loved the code returning closure, but this is more compatible to
70           scrapi.rb API and hopefully less confusing to people.
71
72 0.03  Tue May  8 23:04:13 PDT 2007
73         - use 'TEXT' rather than 'content' to grab text from element
74           to be more compatible with scrapi
75         - Added unit tests using Test::Base
76         - Refactored internal code for easier reading
77         - chained callbacks are now passed HTML::Element, not HTML, to avoid double HTML parsing
78         - Implemented callbacks (iterator) API
79         - Added 'process_first' to be compatible with scrapi
80
81 0.02  Tue May  8 20:03:37 PDT 2007
82         - Added dependencies to Makefile.PL
83
84 0.01  Tue May  8 04:05:59 2007
85         - original version
Note: See TracBrowser for help on using the browser.