root/Email-Find/trunk/README

Revision 429 (checked in by miyagawa, 18 years ago)

rewrite

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1 NAME
2     Email::Find - Find RFC 822 email addresses in plain text
3
4 SYNOPSIS
5       use Email::Find;
6
7       # new object oriented interface
8       my $finder = Email::Find->new(\&callback);
9       my $num_found - $finder->find(\$text);
10
11       # good old functional style
12       $num_found = find_emails($text, \&callback);
13
14 DESCRIPTION
15     Email::Find is a module for finding a *subset* of RFC 822 email
16     addresses in arbitrary text (see the section on "CAVEATS"). The
17     addresses it finds are not guaranteed to exist or even actually be email
18     addresses at all (see the section on "CAVEATS"), but they will be valid
19     RFC 822 syntax.
20
21     Email::Find will perform some heuristics to avoid some of the more
22     obvious red herrings and false addresses, but there's only so much which
23     can be done without a human.
24
25 METHODS
26     new
27           $finder = Email::Find->new(\&callback);
28
29         Constructs new Email::Find object. Specified callback will be called
30         with each email as they're found.
31
32     find
33           $num_emails_found = $finder->find(\$text);
34
35         Finds email addresses in the text and executes callback registered.
36
37         The callback is given two arguments. The first is a Mail::Address
38         object representing the address found. The second is the actual
39         original email as found in the text. Whatever the callback returns
40         will replace the original text.
41
42 FUNCTIONS
43         For backward compatibility, Email::Find exports one function,
44         find_emails(). It works very similar to URI::Find's find_uris().
45
46 EXAMPLES
47           use Email::Find;
48
49           # Simply print out all the addresses found leaving the text undisturbed.
50           my $finder = Email::Find->new(sub {
51                                             my($email, $orig_email) = @_;
52                                             print "Found ".$email->format."\n";
53                                             return $orig_email;
54                                         });
55           $finder->find(\$text);
56
57           # For each email found, ping its host to see if its alive.
58           require Net::Ping;
59           $ping = Net::Ping->new;
60           my %Pinged = ();
61           my $finder = Email::Find->new(sub {
62                                             my($email, $orig_email) = @_;
63                                             my $host = $email->host;
64                                             next if exists $Pinged{$host};
65                                             $Pinged{$host} = $ping->ping($host);
66                                         });
67
68           $finder->find(\$text);
69
70           while( my($host, $up) = each %Pinged ) {
71               print "$host is ". $up ? 'up' : 'down' ."\n";
72           }
73
74           # Count how many addresses are found.
75           my $finder = Email::Find->new(sub { $_[1] });
76           print "Found ", $finder->find(\$text), " addresses\n";
77
78           # Wrap each address in an HTML mailto link.
79           my $finder = Email::Find->new(
80               sub {
81                   my($email, $orig_email) = @_;
82                   my($address) = $email->format;
83                   return qq|<a href="mailto:$address">$orig_email</a>|;
84               },
85           );
86           $finder->find(\$text);
87
88 SUBCLASSING
89         If you want to change the way this module works in finding email
90         address, you can do it by making your subclass of Email::Find, which
91         overrides "addr_regex" and "do_validate" method.
92
93         For example, the following class can additionally find email
94         addresses with dot before at mark. This is illegal in RFC822, see
95         the Email::Valid::Loose manpage for details.
96
97           package Email::Find::Loose;
98           use base qw(Email::Find);
99           use Email::Valid::Loose;
100
101           # should return regex, which Email::Find will use in finding
102           # strings which are "thought to be" email addresses
103           sub addr_regex {
104               return $Email::Valid::Loose::Addr_spec_re;
105           }
106
107           # should validate $addr is a valid email or not.
108           # if so, return the address as a string.
109           # else, return undef
110           sub do_validate {
111               my($self, $addr) = @_;
112               return Email::Valid::Loose->address($addr);
113           }
114
115         Let's see another example, which validates if the address is an
116         existent one or not, with Mail::CheckUser module.
117
118           package Email::Find::Existent;
119           use base qw(Email::Find);
120           use Mail::CheckUser qw(check_email);
121
122           sub do_validate {
123               my($self, $addr) = @_;
124               return check_email($addr) ? $addr : undef;
125           }
126
127 CAVEATS
128         Why a subset of RFC 822?
129             I say that this module finds a *subset* of RFC 822 because if I
130             attempted to look for *all* possible valid RFC 822 addresses I'd
131             wind up practically matching the entire block of text! The
132             complete specification is so wide open that its difficult to
133             construct soemthing that's *not* an RFC 822 address.
134
135             To keep myself sane, I look for the 'address spec' or 'global
136             address' part of an RFC 822 address. This is the part which most
137             people consider to be an email address (the 'foo@bar.com' part)
138             and it is also the part which contains the information necessary
139             for delivery.
140
141         Why are some of the matches not email addresses?
142             Alas, many things which aren't email addresses *look* like email
143             addresses and parse just fine as them. The biggest headache is
144             email and usenet and email message IDs. I do my best to avoid
145             them, but there's only so much cleverness you can pack into one
146             library.
147
148 AUTHORS
149         Copyright 2000, 2001 Michael G Schwern <schwern@pobox.com>. All
150         rights reserved.
151
152         Current maintainer is Tatsuhiko Miyagawa <miyagawa@bulknews.net>.
153
154 THANKS
155         Schwern thanks to Jeremy Howard for his patch to make it work under
156         5.005.
157
158 LICENSE
159         This module is free software; you may redistribute it and/or modify
160         it under the same terms as Perl itself.
161
162         The author STRONGLY SUGGESTS that this module not be used for the
163         purposes of sending unsolicited email (ie. spamming) in any way,
164         shape or form or for the purposes of generating lists for commercial
165         sale.
166
167         If you use this module for spamming I reserve the right to make fun
168         of you.
169
170 SEE ALSO
171         the Email::Valid manpage, RFC 822, the URI::Find manpage, the
172         Apache::AntiSpam manpage, the Email::Valid::Loose manpage
173
Note: See TracBrowser for help on using the browser.