JDOM in the Real World, Part 3
By Jason Hunter
This installment shows how to use JDOM inside your own Java programs.
In the first two parts of this series (Oracle Magazine, and ), I introduced JDOM, an open-source library for Java-optimized XML manipulation, and gave an overview of its design, features, and interfaces. In this article, I show two practical applications developed using JDOM to handle RDF site summary (RSS) news feed processing.
The first application shows how to build a server-side component to display RSS feeds on a Web site as a list that can be plugged into any page. RSS is one of the most widely used applications of XML, helpful for everything from announcing new Web log entries to sharing links to new magazine articles. With practical code examples, I'll show you how JDOM helps parse the RSS XML.
The second application is a client-side component that consumes RSS feeds and e-mails interested parties when new content appears. It's configured using an XML file, so I'll show you how to use JDOM at startup to read configuration information and to manipulate the incoming RSS.
RSS and Meerkat
RSS is a Web syndication format based on XML. It's a simple format allowing Web sites to publish a list of links with titles and descriptions that other sites or third-party clients can access and use. Using RSS, newspapers spread links, Web loggers (called bloggers, for short) announce their posts, and everyone keeps up with what's new. Interested in JDOM? With RSS, you can track the latest JDOM links at . It's not a Web page; it's an XML file containing structured links. shows an RSS format example that covers JDOM.
The RSS name stands for RDF site summary, rich site summary, or really simple syndication, depending on whom you ask. There are also several dialects of RSS in active use, varying from simple and easy (the older forms) to extensible and complicated (the newer forms).
Meerkat, launched by the O'Reilly Network in 2000 as a free service, pulls together the disparate RSS feeds on the internet and makes them centrally available and searchable. Do you want to know about every article on servlets posted in the last 21 days? You can query the Meerkat Web interface or, once you learn the query string pattern, make a direct request.
Meerkat pulls RSS from syndicators and outputs to searchers in formats including RSS, XML, and HTML. For my example, I'll be consuming Meerkat content in its custom XML format, which is similar to RSS but adds extra category, channel, and time stamp information. The following is a sample Meerkat query:
On the Server Side: Embedding RSS in a Web Page
The following application pulls RSS data from either a local file or a remote URL and displays it as a formatted list in a section of an HTML page. The application must meet the following requirements: support both RSS 0.91 (a widely used form, which is much simpler than 1.0) and the Meerkat XML format, allow nonprogrammer configuration of the generated HTML, and cache the content with updates every 30 minutes. You can see the code in use on the front page of ; the "What's New" section is driven by the RSS data file. The data file itself is autogenerated by the MovableType blog tool.
For implementation tools, I'm going to use JDOM to read the XML, a servlet container or application server to serve the content, and the "Tea" framework to handle the display. Odds are, you haven't heard of Tea. Disney created the Tea framework for use on its high-traffic sites such as and , which are constantly updated, and the company released it under an open-source license a couple of years ago. Tea supports an elegant development model and an advanced integrated development environment (IDE) called Kettle and compiles Tea pages directly to Java bytecode (no intermediary servlet class is necessary). It works with any servlet container.
contains the main logic class, MeerkatContext. It acts as a back end for Tea templates. Every public method in this class gets exposed to the Tea front end. The class has just one public method, getStories (String), which takes a URL and returns an array of Story objects. The class supports URLs with either "http:" or "file:" protocols and supports content in either RSS or Meerkat XML format. The class caches each URL's content in the storyCache map, with the corresponding time stamps in the timeCache map. Only if the storyCache is empty or the time stamp is greater than "halfHourAgo" will new content be fetched. Notice how the class caches content internally and doesn't rely on external crontab entries. Crontabs, while heavily used, are external to the Web server and don't move easily between machines. Thus, they should be avoided whenever possible.
To read the content, the Meerkat Context class gets an InputStream with getStoryStream() and then in getStoryDocument() uses SAXBuilder to build a JDOM representation of the content. The getStoryList() method quickly walks the JDOM tree converting JDOM Element objects into Story objects. The getChildText() method is heavily used to directly read child text content.
There are two code paths, one for Meerkat files and one for RSS files. The RSS path loads less information because less is available, although you'll notice that the getTimestampComment() method supports reading a time stamp out of a leading XML comment. This method is a custom RSS extension that helps display a date at the bottom of entries.
The meerkat.tea template file appears in . This file pulls on the Meerkat class back end to render customizable HTML. The template accepts a URL to its "constructor" that gets passed using a request parameter (/meerkat.tea?url=xxx) or another template's direct call. If there's no URL, it sets the default URL to the ONJava feed. The template then customizes how dates and null values should be displayed. Next, it makes the call into the back end to retrieve the Story objects. If the array is null or empty, an error is printed in the page; otherwise the template loops over the array, printing HTML for each entry. Templates have access to the bean properties on all objects returned by functions using the syntax <% bean.property %>.
The following code concludes our application with a portion of an index.tea page that would call on meerkat.tea.
<!— page snippet —> <b>What's New at OTN</b><br /> <% call meerkat( "/ws/otnrss.xml") %>
Any Tea template page can add an automatically updating RSS or Meerkat data feed with one simple <% call meerkat(url) %> command. Behind the scenes, JDOM takes care of the XML parsing, Tea takes care of the templating, Meerkat handles the collection, and servlets glue it all together.
On the Client Side: Blog Alert
My sample client-side application, which I call Blog Alert, consumes RSS and Meerkat feeds and, after detecting a change, sends e-mail notifications to interested parties. It supports instant notification or allows updates to be queued for daily or weekly digest mails. The e-mails can have customized subjects and footers with substitution rules allowing the insertion of title, category, and date information. shows what e-mail notifications look like.
Blog Alert runs off an external XML-based configuration file that provides the mail host information (for sending mails) and a set of mailing entries. Each mailing has a type (instant, daily, weekly), an RSS source, an e-mail sender address, an e-mail recipient address (such as a mailing list), a subject, and an optional footer. Subjects and footers support %title, %category, and %date substitutions. shows an example that gives instant notice of Servlets.com changes, weekly notices of JDOM changes, and daily notices of XMLHack changes.
The application code starts in , which shows the BlogAlert class, holding the main() entry point and coordinating the processes. It starts by loading the configuration file (blogalert.xml) into a Config object.
Then the handleMailing() method follows a multistep algorithm. It gets the current list of stories from the RSS feed, loads the previous list of stories from disk, stores the current list to be the previous list for the next time handleMailing() is called, and then determines what new stories have appeared. Next, the logic determines what stories were pending (new but not sent out yet) and removes from the current list any in the pending list. Then handleMailing() creates and sets a new pending list and looks to see if it's time to send the e-mail. If so, it sorts the pending list and sends an e-mail.
shows the Config class, which is responsible for loading the blogalert.xml data file and returning the information with the getMailHost() and getMailings() methods. Almost the entire class consists of JDOM calls.
The Mailing class in represents each configured mailing item and makes available to the main logic the mailing's id, type, RSS location, sender, recipient, subject, footer, time stamp, current stories, previous stories, and pending stories. The class also has logic to determine if it's time to send and has a send() method that initiates the sending.
Notice that the previous and pending story lists are maintained as external XML files (mailingname.lastrss and mailingname.pending) in the Mailing class. The reading and writing of these lists is managed by the Story class—discussed later—an enhanced version of the class shown earlier running on the server side. For a time stamp, the Mailing class uses a mailingname.timestamp file storing a string representation of the time. It's separate and readable because, as an administrator, I find it sometimes useful to tweak the time—for example, to force-send a weekly message. The isTimeToSend() method looks at the mailing type and how much time has elapsed and returns a boolean. The send() method uses the MailSender class to send e-mails and a WrapFormat class to handle consistent indenting. (These two classes are interesting but orthogonal to the point of this article; for their code, go to and .)
The last class I'll examine, in , is a grown-up version of the Story class, based on the simple Story class I presented earlier. Now it's enhanced with methods to read and write story lists to and from files and includes a Comparable interface to support date-based sorting.
The setStoryList(List, File) method contains the bulk of the new Story class logic. This method constructs an in-memory, JDOM-modeled, RSS-style document from the passed-in Story list and uses JDOM's XMLOutputter to pretty-print the information for display to a local file. These files support the retrieval of previous and pending entries. I could have kept the original RSS or Meerkat feed around, but given how easy it is to use JDOM to create XML output, it's not worth the bother.
Run the BlogAlert class periodically and watch as it pulls down RSS feeds, examines the feeds for new material, and fires notifying e-mails when appropriate. You'll see how JDOM helps read the configuration file, handles the incoming RSS feeds, and supports the outgoing RSS storage duties.
JDOM is a simple and straightforward way to handle XML files with Java. Written in and for Java, JDOM provides you with an intuitive way to read, write, and manipulate XML documents. Best of all, JDOM has been published under an open-source Apache-style license with a wide user and developer community that has developed the application-programming interface (API) to solve real-world problems. JDOM is also in the process of going through Sun's Java Community Process (JCP) as a Java Specification Request (JSR)—the first open-source project to become a JSR.
This article walks you through two practical uses of XML and JDOM for manipulating an RSS news feed. They're inspired by my real-life needs managing the Servlets.com site, and I hope the use of JDOM in these examples can help you with your real-life needs too.