root/Web-Scraper/trunk/Changes

Revision 2397 (checked in by miyagawa, 13 years ago)

Checking in changes prior to tagging of version 0.24. Changelog diff is:

=== Changes
==================================================================
--- Changes (revision 9085)
+++ Changes (local)
@@ -1,5 +1,10 @@

Revision history for Perl extension Web
Scraper

+0.24 Sun Nov 25 15:58:38 PST 2007
+ - Support duck typing in filter args to take object that has 'filter' method
+ This could give Web::Scraper::Filter::Pipe a better interface
+ (Thanks to hanekomu and tokuhirom)
+

0.23 Sat Nov 24 17:21:14 PST 2007

- Upped Web
Scraper dependency
- Skip & test until HTML::TreeBuilder?
XPath fixes it

Line 
1 Revision history for Perl extension Web::Scraper
2
3 0.24  Sun Nov 25 15:58:38 PST 2007
4         - Support duck typing in filter args to take object that has 'filter' method
5           This could give Web::Scraper::Filter::Pipe a better interface
6           (Thanks to hanekomu and tokuhirom)
7
8 0.23  Sat Nov 24 17:21:14 PST 2007
9         - Upped Web::Scraper dependency
10         - Skip & test until HTML::TreeBuilder::XPath fixes it
11         - removed eg/search-cpan.pl
12
13 0.22  Wed Oct 17 17:51:54 PDT 2007
14         - 's' on scraper shell now prints to pager (e.g. less) if PAGER is set
15
16 0.21_01 Thu Oct  4 01:05:00 PDT 2007
17         - Added an experimental filter support
18           (Thanks to hirose31, tokuhirom and Yappo for brainstorming)
19
20
21 0.21  Wed Oct  3 10:37:13 PDT 2007
22         - Bumped up HTML::TreeBuilder dependency to fix 12_html.t issues
23           [rt.cpan.org #29733]
24
25 0.20  Wed Oct  3 00:28:13 PDT 2007
26         - Fixed a bug where URI is not absolutized with a hash reference value
27         - Added eg/jp-playstation-store.pl
28
29 0.19  Thu Sep 20 22:42:30 PDT 2007
30         - Try to get HTML encoding from META tags as well, when there's
31           no charset value in HTTP response header.
32
33 0.18  Thu Sep 20 19:49:11 PDT 2007
34         - Fixed a bug where URI is not absolutized when scraper is nested
35         - Use as_XML not as_HTML in 'RAW'
36
37 0.17  Wed Sep 19 19:12:25 PDT 2007
38         - Reverted Term::Encoding support since it causes segfaults
39           (double utf-8 encoding) in some environment
40
41 0.16  Tue Sep 18 04:48:47 PDT 2007
42         - Support 'RAW' and 'TEXT' for TextNode object
43         - Call Term::Encoding from scraper shell if installed
44
45 0.15  Sat Sep 15 21:28:10 PDT 2007
46         - Call env_proxy in scraper CLI
47         - Added $Web::Scraper::UserAgent and $scraper->user_agent accessor to deal
48           with UserAgent object
49         - Don't escape non-ASCII characters into &#xXXXX; in scraper shell 's' and WARN
50
51 0.14  Fri Sep 14 16:06:20 PDT 2007
52         - Fix bin/scraper to work with older Term::ReadLine.
53           (Thanks to Tina Müller [RT:29079])
54         - Now link elements like img@src and a@href are automatically
55           converted to absolute URI using the current URI as a base.
56           Only effective when you do $s->scrape(URI) or $s->scrape(\$html, URI)
57         - Added 'HTML' and its alias 'RAW' to get the HTML chunk inside the tag
58             process "script", "code" => 'RAW';
59           Handy if you want the raw HTML code inside <script> or <style>.
60           (Thanks to charsbar for the suggestion)
61
62 0.13  Sun Sep  2 17:11:08 PDT 2007
63         - Added 'c' and 'c all' command to scraper to generate the
64           code to replay the session
65         - Added 'WARN' as a shortcut to sub { warn $_->as_HTML } on scraper shell like:
66             process "a", WARN; # print 'a' elements as HTML
67         - Added 'search-cpan.pl' and 'rel-tag.pl' to eg/
68
69 0.12  Thu Aug 30 02:39:44 PDT 2007
70         - Added 's' command to scraper to get the HTML source
71         - You can use $tree variable to deal with the HTML::Element object in scraper shell
72         - Give a graceful error message if the given Selector/XPath doesn't compile
73         - Give a better error when number of args in process() seems wrong
74
75 0.11  Tue Aug 28 02:50:01 PDT 2007
76         - Supported hash-reference in process values, like
77           process "a", "people[]", { link => '@href', name => 'TEXT' };
78           See t/09-process_hash.t for its usage.
79
80 0.10  Mon Aug 27 00:53:51 PDT 2007
81         - result now returns the entire stash if called without keys
82         - added bin/scraper CLI
83
84 0.09  Wed Aug 15 10:51:14 PDT 2007
85         - remove Devel::Leak use from tests
86
87 0.08  Tue Aug 14 13:25:16 PDT 2007
88         - Call $tree->delete after the callback to avoid memory leaks by TreeBuilder.
89           (Thanks to k.daiba for the report)
90
91 0.07  Sat May 12 16:23:51 PDT 2007
92         - Updated dependencies for HTML::TreeBuilder::XPath
93
94 0.06  Sat May 12 15:47:27 PDT 2007
95         - Now don't use decoded_content to work with new H::R::Encoding
96
97 0.05  Wed May  9 18:21:22 PDT 2007
98         - Added (less DSL-ish) Web::Scraper->define(sub { ... }) syntax
99         - Fixed bug where the module dies if there's no encoding found in HTTP response headers
100         - Added more examples in eg/
101         - When we get value using callback, pass HTML::Element object as $_, in addition to $_[0]
102           (Suggested by Matt S. Trout)
103         - If the expression (1st argument to process()) starts with "/", it's
104           treated as a direct XPath and no Selector-to-XPath conversion is done.
105
106 0.04  Wed May  9 00:55:32 PDT 2007
107         - *API CHANGE* Now scraper {} returns Web::Scraper object and not closure.
108           You should call ->scrape() to get the response back.
109           (Suggested by Marcus Ramberg)
110
111           I loved the code returning closure, but this is more compatible to
112           scrapi.rb API and hopefully less confusing to people.
113
114 0.03  Tue May  8 23:04:13 PDT 2007
115         - use 'TEXT' rather than 'content' to grab text from element
116           to be more compatible with scrapi
117         - Added unit tests using Test::Base
118         - Refactored internal code for easier reading
119         - chained callbacks are now passed HTML::Element, not HTML, to avoid double HTML parsing
120         - Implemented callbacks (iterator) API
121         - Added 'process_first' to be compatible with scrapi
122
123 0.02  Tue May  8 20:03:37 PDT 2007
124         - Added dependencies to Makefile.PL
125
126 0.01  Tue May  8 04:05:59 2007
127         - original version
Note: See TracBrowser for help on using the browser.