NAME

newstap - retrieve news articles and deliver them as mail messages


SYNOPSIS

    newstap


DESCRIPTION

newstap is a tool for retrieving news articles from various sources and delivering them in a variety of ways as mail messages. Basically, when run, it iterates through a set of user-configured news sources, retrieving new articles, and dispatching them through user-configured delivery methods.

Currently, news retrieval is limited to NNTP (per RFC 977) servers. For testing (or silliness), there is also a simulated news retrieval method which actually just runs /usr/games/fortune to generate messages. Other retrieval methods are planned, hopefully to cope with things like secure NNTP or Web-based news sources.

Regardless of retrieval method, newstap follows a model where each news source specifies a retrieval method, a server, and optionally a port. Within a news source, you must specify one or more groups. Clearly, newstap is oriented, start to finish, towards USENET news retrieval. There are a set of options, including the delivery method, which can be specified at any level from globally down to per-group.

All news articles are kept in RFC [2]822 format. This is the standard format for both email and USENET news. By default, newstap adds a few headers, including the standard Received: header, before delivery; this is configurable in the options as well.

At startup, newstap loads its configuration from a newstaprc file. This file is typically located in your home directory at ~/.newstaprc or at ~/.newstap/newstaprc. It is a structured file which completely configures newstap. With this version, you must have this file; there is no other way to tell newstap what to do.

Since newstap also needs to remember things in between invocations, such as the last message you retrieved in each group, it also requires a state file. This file is, by default, in ~/.newstap.state; it may also be in ~/.newstap/state, or in a custom location specified in your newstaprc file. This file will be created if it does not exist, and will be overwritten each time newstap is run. Theoretically, it is plain human-editable ASCII (and actually uses a variant of newstaprc's format), but you should not edit it. Its format may change unexpectedly, and I'm not going to document it anywhere but in the source code.


CONFIGURATION FILE

A newstaprc file is a plain text configuration file. If you want a quick start, skip down to the EXAMPLES section, copy and paste, and edit them. Use this section as a reference.

Structure

Parsing is line-oriented; so don't go splitting statements over multiple lines or putting multiple statements on a line. That won't work. All keywords are case sensitive in this version.

Comments may occur anywhere, and are denoted by the usual shell-style `#' character. The line is truncated at the first occurrence of this character. Blank lines are allowed. Leading and trailing space is removed from each line before parsing. Whitespace is whitespace; feel free to separate words by tabs or whatever. Finally, note that words are delimited ONLY by whitespace; so, unlike some other formats, `foo{' is one token and not two. Make sure you put whitespace where I put it in my descriptions.

newstaprc allows certain types of blocks to occur: these are denoted by a statement ending with a `{', which begins the block, and the special statement '}' (alone on a line), which ends the current block.

Environment Variables and Tildes

As in the shell, newstaprc files allow you to insert the values of environment variables or your home directory using one of the following forms:

    Text            Gets Replaced With
    ----            ------------------
    ${name}         The value of the environment variable `name'.
    $name           The value of the environment variable `name'.
    ~               Your home directory.

The ${...} form is less ambiguous than the $... form. ~ first looks for a ${HOME} value; if that is not defined, it reads your passwd file entry. These may all occur anywhere within any line in the file, and are interpreted when the file is read.

Special Statements

These statements don't directly affect any newstap settings; they are useful when constructing and testing RC files.

__END__
Immediately terminates parsing of the RC file, without triggering an error. Useful if you want to put descriptive text or a variety of configuration options at the end of a file without newstap parsing them.

writeln text...
Writes text to standard error immediately (i.e., without waiting for the end of the parsing). Only really useful for debugging.

General Options

The following statements set options which can occur anywhere in the file. Per-group options override per-server options; per-server options override global options; global options override defaults. You get the picture.

delivery method [args...]
Specifies a delivery method for retrieved/generated messages. The current set of delivery methods is documented below, in the DELIVERY section.

args, if present, will be scanned for % characters and formatting will be done; see FORMATTING STRINGS, below.

drop
Equal to delivery null. Causes all retrieved articles to be discarded.

headers_only [no]
Without no, turns on headers-only mode: message bodies are discarded and only headers are retrieved. (When delivered, they show up as having empty bodies.)

truncate size-or-no [unit]
If size-or-no is the keyword no, then turns off truncation. Otherwise, specifies a maximum article body size (note: BODY, not TOTAL) beyond which the body is simply cut off. Default unit is bytes; you can use the keyword kbytes to multiply by 1024.

Note that this is an approximate truncation value. Typically, the body text will actually be rounded up to the next line.

add_header tag value...
Each message which is delivered gets a header of the form `tag: value' added at the end of its header list. If tag already names a header in the message, that header is left unmodified. add_header strictly adds headers to the message.

value will be scanned for % characters and formatting will be done; see FORMATTING STRINGS, below.

set_header tag value...
For each message delivered, the header tag is set to the value value. If tag is not already defined in the message headers, the effects are identical to add_header. Otherwise, the first occurrence of tag in the message is replaced with the new `tag: value' pair, and any subsequent occurrences of the same tag are deleted.

value will be scanned for % characters and formatting will be done; see FORMATTING STRINGS, below.

retrieve ( all | ( oldest | newest ) num [unit] )
Limits how many articles newstap will retrieve each time it runs. The default is retrieve all, which simply fetches every matching article. If you specify retrieve oldest num, at most num articles will be retrieved each time. The unit value may be bytes, kbytes, or articles; articles is the default. retrieve newest works just like retrieve oldest, except it fetches articles from newest to oldest.

Note that statements within a group override those for the enclosing server, which override those outside of any server blocks. This gives you finer-grained control over which news sources may consume your bandwidth.

initially ( all | ( oldest | newest ) num [unit] )
This works just like retrieve, except that it only applies the first time you fetch a particular group. (If you don't have a specific initially rule, then any retrieve rules still apply.)

Global Configuration Options

These options can only appear at the global level. They make no sense within any blocks.

state_file filename
Use this if you want newstap to save its state somewhere other than the default locations.

retrieval_method server[:port] {
This statement begins a server block. The retrieval_method must currently be one of nntp or fortune (which is really only for testing). The server should provide the hostname or IP address of the server to which to connect, and if you are using a nonstandard port, you may provide port as well. The { is required, and nothing may follow it on the line. The server block, like all blocks, is terminated by a matching } alone on a line.

Per-Server Configuration Options

These statements can only appear immediately within a server block.

group groupname [{]
Defines a group. Every server must have at least one group. For nntp servers, this is simply the name of the newsgroup to which to subscribe. Without the {, this simply subscribes you to a group. With the {, however, it subscribes you and begins a group block. Within the group block may be any number of General Options (see above) which apply only to retrieval from that group.

auth username password
Provides a username and password for the server. Currently, this only allows the AUTHINFO SIMPLE extension for NNTP, which only works on some servers. IMPORTANT NOTE: This does NOT enable any form of encryption; your password will be sent in CLEAR TEXT over the socket. Please do not use Newstap if you require secure news access. However, there exist servers which require user logins but don't support encryption.


DELIVERY

newstap supports a small set of message delivery methods. It is extensible at the source code level to support new methods; however, it is more flexible to use a well-known method such as standard `mbox' format files and use other tools for ultimate delivery.

Current delivery methods include:

null
Simply discards the messages. delivery null can be abbreviated as simply drop.

mbox [|] name...
Writes messages, in standard Unix mbox format, to the given name. If | (the ``pipe'' character) is provided, name is taken to be any shell command, it is executed, and the mbox data is piped in. If | is not present, but name is either stdout or the abbreviation -, the mbox data is written to newstap's standard output. Finally, if | is missing and name is anything else, name is taken to be the name of a regular file. The mbox data is appended to that file.

If name is a regular file, then by default newstap will attempt to lock it by creating a link named ``name.lock''. If it cannot obtain this lock because some other process has, it will wait until it can. This locking behavior can be disabled by prefixing name with an asterisk character:

    delivery mbox * mbox-file-name

With | specified, name is interpreted as a regular shell command, and there's no reason it can't contain further pipes and redirects:

    delivery mbox | filter_one | filter_two 2>/dev/null >> output

Also note that the delivery statement applies formatting to its arguments, so you can easily do things like:

    delivery mbox ~/Mail/mbox-%s-%g

message [|] name...
message delivery works just like mbox delivery with one important difference: it writes one message at a time to the target. That means it starts the command (or opens the file), writes the message, and finishes before moving to the next news article. This is useful for filters such as procmail(1), which accepts one mail message per invocation and processes it. For stdout output, there's no difference at all between message and mbox, and for direct file output (i.e. without the |), message also ultimately has the same effect as mbox (but is a little slower).

smtp [host!]address
smtp delivery uses the SMTP (Simple Mail Transfer Protocol) to deliver mail to a target user's inbox. The mailbox may be local or remote. If your system is already set up to receive SMTP mail (most Unixlike systems are), this might require the least configuration of any of the options.

SMTP is a network protocol whereby newstap will connect to a server and deliver messages. If you specify host, followed by a bang, before the address (no space may occur within host or between it and the bang), that host will be the one that newstap connects to. Otherwise, if your target address contains a hostname (i.e. user@host), that will be the host that newstap connects to. If no host is specified in either way, newstap will by default connect to localhost.

address may either be a ``bare address'', such as trickey or trickey@foo.bar, or it may be an RFC 822-style full name plus address, such as Aaron Trickey >. In the latter form, you are free to use any text you choose, as long as whatever is inside the < and > is a valid email address. In the former form, the whole thing must be a valid email address.

Examples:

    # Deliver to my local account (most likely usage)
    delivery smtp trickey
    # Deliver to a specific account, with a prettier To: line
    delivery smtp mail.mydomain.dom!Typical User <tuser@mydomain.com>


FORMATTING STRINGS

In certain places, newstap lets you specify a so-called `formatted string'. This is a piece of text which can contain special `formatting codes' that get replaced with different values.

    Code        Replaced With
    ----        -------------
    %%          %
    %d          The current time and date, in a standard format
    %g          The current group name
    %h          The local hostname
    %s          The current server name
    %u          The user name under which I<newstap> is running
    %v          The name and version of the program (e.g. newstap 0.9.2)

So, for example, you might have

    add_header X-From-Newsgroup news://%s:%p/%g

which might result in something like

    X-From-Newsgroup: news://news.foo.bar:119/alt.os.linux


RETURNS

newstap will return a nonzero (error) status code if it couldn't load or parse its configuration file or if it encountered any other errors. Otherwise it will return zero.


EXAMPLE

The following .newstaprc file demonstrates a few of the software's features:

    # Filter all my news messages via procmail, just like my email
    delivery message | procmail
    nntp news.freshmeat.net {
        group fm.announce
    }
    # Note: news.example.com doesn't really exist....
    nntp news.example.com {
        auth My-User-Name My-Password
        # This server keeps articles forever; when I add a new group,
        # just catch up the 100 most recent ones:
        initially newest 100 articles
        group comp.lang.lisp
        group comp.std.c++
    }


FILES

~/.newstaprc OR ~/.newstap/newstaprc
The configuration file.

~/.newstap.state OR ~/.newstap/state OR as configured
Storage for news retrieval state (e.g. last article read per group) between invocations. Automatically generated.


BUGS

Lacking support for nontrivial NNTP authentication or secure NNTP transport. Completely single-threaded. No way to configure it except via the config file syntax.


SEE ALSO

Similar in spirit, and a source of inspiration: fetchmail(1)

How I process messages I retrieve: procmail(1)

How I read those messages: mutt(1)


CONFORMS TO

RFC 977 - Network News Transfer Protocol

RFC 822 - Standard for the Format of ARPA Internet Text Messages


AUTHOR

Aaron Trickey <amtrickey@users.sourceforge.net>


HISTORY

When initially looking for a quick way to grab news articles into procmail(1), I came across a small Perl program called `fetchnews' that did the job. Well, I started cleaning it up, fixing some bugs, and adding some features, but got carried away and decided to rewrite it, as much for fun as for functionality. Hence newstap. Fetchnews is available at <http://files.moo.ca/~laotzu/fetchnews.html>, and was written by Matthieu Fenniak.