08.03.08
All Tweets Considered – Technical Details
All Tweets Considered : An NPR Twitter Browser
Navigation
I used the NPR API list interface to get the NPR programs, topics, and people. These are the same lists that the NPR Query generator uses to populate the different categories. Details can be found here. The results are returned in xml, which is very easy to parse in PHP using simpleXml. The topics are subdivided into two subcategories, news and music. I decided to split them into two sections in the navigation so that people wouldn’t get too overwhelmed with choices. The person list is subdivided into subcategories by letter. I spent some time trying to allow people to click on a letter and see only that group of people, but I wasn’t able to make the styles look right. So, I just show the whole huge list of NPR personalities. At least it’s in alphabetical order.
The NPR API also includes lists of series, columns, and music artists that NPR has stories about. I didn’t think the search on Twitter would return enough interesting NPR relevant results, however, if you’re making a navigation bar of your own, these lists are definitely worthwhile.
Useful inside tip: These lists are automatically truncated to show only topics, series, and columns that have recent stories associated with them. This is to make sure that the query generator ui isn’t overflowing. If you include the date parameter to the api call, i.e. &date=YYYY-MM-DD, it will show older aggregations.
The list query doesn’t currently support returning JSON, although I know that it might in the future. For those interested, John Tynan has published a list of topics in JSON format using Yahoo Pipes.
I used jQuery to do the fancy menus so that they slide up and down. Having all of the programs, topics, and people in one column was obviously not going to be usable.
Twitter Interface
I’m using the search.twitter.com interface. The details of their API can be found here. You don’t need to sign up for an API key to get access to it. You can automatically get an atom feed of any search using the following url:
http://search.twitter.com/search.atom?q=<query>
You can also get JSON output if that works better for you. I was parsing the results in PHP, so I decided to use the XML atom results instead. However, I’ve found that the atom results don’t output the date of the tweet correctly. The date in each entry is the date the atom feed was generated, so that’s pretty useless. I’m probably gonna switch over to using the JSON output to see if that gives me better results. I find that it’s hard to get the right context for the search results without knowing when the tweet was added.
In order to get useful results, I sometimes added extra search terms. For example, if someone selected All Things Considered, I also searched for ATC, since I know that people often use the abbreviation when talking about the show. I managed to tweak the search terms enough for all the programs so that clicking on any one of them will return relevant NPR results.
Parsing tinyurls
The trickiest part to this whole mashup is finding out which links in the tweets are actually NPR stories. In order to save space in the character limited tweet, most people use tinyurl or some other minimizing url site. They all appear to work in the same way. When someone clicks on the link, they receive an HTTP status of 301 Moved Permanently or 302 Found. The original link is in the HTTP headers as Location: [new url]. When I find a link in the tweet, I do an HTTP GET and look in the headers for the Location field. When I find it, I look for an npr.org url. An easy way to that is to look for a url that has storyId= or prgId= in it. It’s easy at this point to get the NPR id for the story or program and pass it to the NPR API.
NPR API
I use the following query to access the API from the id:
http://api.npr.org/query?id=[id]&output=HTML&apiKey=[apikey]
There was no reason for me to reinvent the wheel, so I just used the HTML widget that NPR provides. It gives the title, with a link back to the story, the date, the teaser (just a short summary of the article), and several ways to listen to the story. That was more than enough for my purposes, so I didn’t bother to parse the NPRML. If I wanted to show images or the full text of the story, I would use that.
Useful inside tip: The HTML produced by the widget uses CSS classes for every element so that it can be styled to fit the look and feel of the site it’s on. All I did was put a border around it. My CSS expertise is sorely lacking, but I’m trying to learn it better.
And that’s it in a nutshell. It’s still a work in progress, but I think it works pretty well already. My main complaint is that the CSS is pretty haphazard so it looks different in Firefox and IE, and I think it could look a lot slicker. Anybody out there with some pointers they could offer to make things look better?
If you want to look at the source code, I’ve set up a Subversion repository at http://www.jgrosman.com/svnalltweets.
If anybody has any questions about how the NPR API works, or wants to brainstorm some mashups, let me know. I love seeing what people are doing with it already, and it’s only been out a couple of weeks. I think the possibilities are endless.
John Tynan said,
August 5, 2008 at 7:59 pm
Jason,
Great write up. I like how you are pushing the limits of these different api’s. And the interface is great!
Thanks for the mention as of the lists of NPR topics via Yahoo Pipes. I also have a list of programs, bios, music artists and music genres here:
http://pipes.yahoo.com/pipes/person.info?eyuid=yliICdc5p2vPuTLGXYaY3xI-
Are the lists automatically truncated at NPR or as part of the PHP parser? I’ll definitely checkout the svn repository and come back with more questions.
Real fun! I want to tweet NPR, just to see myself come up on the All Tweets browser.
Thanks! JT
Jason said,
August 5, 2008 at 9:17 pm
The lists are automatically truncated coming from the database. I forget off the top of my head how far back they go, but they are fine for most uses. The only list it might be worth extending the time frame is the one for series, since they tend to come and go pretty regularly.
Glad you’re having fun with it.
John Tynan said,
August 7, 2008 at 10:14 am
One additional note. As I’ve been checking back on the alltweets tool, I’m getting such a glimpse into the full spectrum of things people say/tweet about NPR. This has to be some value to people at NPR who are interested in how NPR stories and the overall brand is perceived… in real time. Have the researchers and marketers or even the program directors given you any feedback?
Jason said,
August 7, 2008 at 11:11 am
It really hasn’t gotten that much exposure yet. I wouldn’t even know where to start bring it to their attention.
I hadn’t really thought of the usefulness of the tool for people inside NPR. You’re right, though. Browsing alltweets always makes me feel like I’m standing next to a water cooler, listening to the conversations as they go past, for good or ill.