08.08.08

Update to All Tweets Considered

Posted in NPR API at 12:01 am by Jason

I made two big changes and a few minor tweaks to the all tweets app.

The first thing I did was switch from using the search.twitter.com atom API to using their JSON API. The big difference was that the JSON output includes the correct date and time for each tweet. I think the app is much more usable if you know when something was said, and not just what. It’s especially useful for those search terms that don’t get updated too often and the conversations were months ago.

I’m not super happy with the style on the date. It’s pushed over to the right like I wanted, but it tends to be at different vertical alignments in the result box, depending on the length of the tweet. Maybe I should move it to the top right instead of the bottom right. It might be easier to position that way.

While updating the HTML for the switch to JSON, I added links back to the original poster.

The second change I made was to add more calls to the NPR API. I added a javascript call to get the top three stories from the NPR News topic. It was very easy to do using the Query Generator. I thought that if someone was using the all tweets app, then they’re probably the right audience for seeing actual NPR content. There seemed to be enough space there.

I also added a mouse over event to every link in the navigation. This opens up a popup div next to the menu showing the top story for whichever program, topic, or person the user is hovering over. I do this with an iframe that uses the HTML widget as it’s source, and a little CSS styling to make it look seamless.

I put this one in just for fun, and to see if it would work. I’m torn. I think it’s a neat effect, and definitely a blast to go down the list and see a wide range of NPR content. However, it’s very busy and doesn’t go with the Twitter theme. It’s just extraneous. It may be something that’s a better fit with a different type of app.

That said, I’m going to keep it in for now. My justification is the Rule of Cool.

The original search.twitter.com has a very slick way of calling back to the server and displaying that there are new results for the query. I’m thinking of doing something similar. I don’t think it would be too hard using AJAX.

08.05.08

Finding relevant keywords

Posted in NPR API at 11:31 pm by Jason

One thing that NPR doesn’t provide is contextual keywords for its stories. Stories can be assigned to topics, series, columns, and programs. Certain music stories are assigned to artist and genre pages. That doesn’t really give you an idea of what the story is about. The title, teaser, and miniteaser each provide a summary in their own way, but they don’t lend themselves to grouping related stories. Search is only useful once you have an idea what you’re looking forward, but it doesn’t lend itself to finding interesting new content or following the trends as they rise and fall.

I’ve been thinking about how to go about automatically figuring out the keywords from the text. I’m trying to decide between writing my own keyword engine, or finding an API to do it for me.

The benefit to using a third party API is obvious. It means I don’t have to reinvent the wheel and I can make use of other people’s expertise. I admit that I don’t know all the science behind this kind of text parsing, so my implementation would definitely be bare bones. On the other hand, I have an advantage that a third party app wouldn’t have. I know that I’m parsing a news story and I even know what topics the editors assigned the story to. Having that information can only improve the results. Plus, I’d probably learn a lot by trying to do it myself.

Doing a basic Google search, I find a few services to do what I’m looking for. Most of them are geared towards linking to ads and SEO. This is an area with a lot of research into it, but I’m not necessarily looking for something that would have satisfied my computer science profs, just something good enough to be interesting. Maybe I should pull out my old algorithms book, the one that’s been gathering dust for 10+ years.

Let’s try an example, and see where we can go with it.

It’s assigned to the Sports and Nation topics, as well as Beijing Olympics 2008 and Profiles: Bound For Beijing series. None of those would link to stories from the Olympics in previous years, nor does it tell us that the sport is weightlifting. And it might be nice to be able to automatically look for other information about Melanie Roach, not just on NPR, but using other APIs. Besides the teaser that’s shown above, there’s a lot of text in the story that could be analyzed, from the full length story to the image captions.

I’d be curious to see what a third party service would find versus just pulling out all the capitalized phrases and figuring out which uncommon words are used most often.

One idea that I have is to take a look at the stories for a particular day and be able to figure out what NPR thought was important about that day. I think would be an interesting way to browse the news and watch the ebb and flow.

The other idea that I’m thinking about is being able to generated a tag cloud for a particular time frame. You could not only figure out what the keywords are for the stories but keep track of which ones were used more, and then display it in a visually interesting manner.

08.03.08

All Tweets Considered – Technical Details

Posted in NPR API at 8:27 pm by Jason

All Tweets Considered : An NPR Twitter Browser

Navigation

I used the NPR API list interface to get the NPR programs, topics, and people. These are the same lists that the NPR Query generator uses to populate the different categories. Details can be found here. The results are returned in xml, which is very easy to parse in PHP using simpleXml. The topics are subdivided into two subcategories, news and music. I decided to split them into two sections in the navigation so that people wouldn’t get too overwhelmed with choices. The person list is subdivided into subcategories by letter. I spent some time trying to allow people to click on a letter and see only that group of people, but I wasn’t able to make the styles look right. So, I just show the whole huge list of NPR personalities. At least it’s in alphabetical order.

The NPR API also includes lists of series, columns, and music artists that NPR has stories about. I didn’t think the search on Twitter would return enough interesting NPR relevant results, however, if you’re making a navigation bar of your own, these lists are definitely worthwhile.

Useful inside tip:  These lists are automatically truncated to show only topics, series, and columns that have recent stories associated with them. This is to make sure that the query generator ui isn’t overflowing. If you include the date parameter to the api call, i.e. &date=YYYY-MM-DD, it will show older aggregations.

The list query doesn’t currently support returning JSON, although I know that it might in the future. For those interested, John Tynan has published a list of topics in JSON format using Yahoo Pipes.

I used jQuery to do the fancy menus so that they slide up and down. Having all of the programs, topics, and people in one column was obviously not going to be usable.

Twitter Interface

I’m using the search.twitter.com interface. The details of their API can be found here. You don’t need to sign up for an API key to get access to it. You can automatically get an atom feed of any search using the following url:

http://search.twitter.com/search.atom?q=<query>

You can also get JSON output if that works better for you. I was parsing the results in PHP, so I decided to use the XML atom results instead. However, I’ve found that the atom results don’t output the date of the tweet correctly. The date in each entry is the date the atom feed was generated, so that’s pretty useless. I’m probably gonna switch over to using the JSON output to see if that gives me better results. I find that it’s hard to get the right context for the search results without knowing when the tweet was added.

In order to get useful results, I sometimes added extra search terms. For example, if someone selected All Things Considered, I also searched for ATC, since I know that people often use the abbreviation when talking about the show. I managed to tweak the search terms enough for all the programs so that clicking on any one of them will return relevant NPR results.

Parsing tinyurls

The trickiest part to this whole mashup is finding out which links in the tweets are actually NPR stories. In order to save space in the character limited tweet, most people use tinyurl or some other minimizing url site. They all appear to work in the same way. When someone clicks on the link, they receive an HTTP status of 301 Moved Permanently or 302 Found. The original link is in the HTTP headers as Location: [new url]. When I find a link in the tweet, I do an HTTP GET and look in the headers for the Location field. When I find it, I look for an npr.org url. An easy way to that is to look for a url that has storyId= or prgId= in it. It’s easy at this point to get the NPR id for the story or program and pass it to the NPR API.

NPR API

I use the following query to access the API from the id:

http://api.npr.org/query?id=[id]&output=HTML&apiKey=[apikey]

There was no reason for me to reinvent the wheel, so I just used the HTML widget that NPR provides. It gives the title, with a link back to the story, the date, the teaser (just a short summary of the article), and several ways to listen to the story. That was more than enough for my purposes, so I didn’t bother to parse the NPRML. If I wanted to show images or the full text of the story, I would use that.

Useful inside tip: The HTML produced by the widget uses CSS classes for every element so that it can be styled to fit the look and feel of the site it’s on. All I did was put a border around it. My CSS expertise is sorely lacking, but I’m trying to learn it better.

And that’s it in a nutshell. It’s still a work in progress, but I think it works pretty well already. My main complaint is that the CSS is pretty haphazard so it looks different in Firefox and IE, and I think it could look a lot slicker. Anybody out there with some pointers they could offer to make things look better?

If you want to look at the source code, I’ve set up a Subversion repository at http://www.jgrosman.com/svnalltweets.

If anybody has any questions about how the NPR API works, or wants to brainstorm some mashups, let me know. I love seeing what people are doing with it already, and it’s only been out a couple of weeks. I think the possibilities are endless.