Paging through the Twitter stream
Working with an API can be a hassle—especially if you have no control over it. Twitter’s API has its fair share of obstacles, one of which is paging. When you first look at a method in the Twitter API, you will most likely go straight to the page parameter. It’s simple and expected—page 1 is the most recent and the page number increases as you navigate back in time. This seems like the right solution, but it’s not. The page parameter has a major downside.

Since page 1 is the most recent, the entire paging system is relative and forever changing with every new tweet. This means if your user pages back in time, as new tweets arrive in the Twitter database (for his stream), he’ll see the same tweets finding their way into the next page. If the user pages older once and 20 new tweets arrive in Twitter database, he will see the exact same page the next time he pages older (if the count parameter is set to 20).

This issue can be easily dodged by using the max_id parameter. With it, you can set the maximum tweet id number to appear in the stream, excluding all tweets that appear after it. This can be used in paging by setting max_id to one less than the id of the bottom tweet in the stream. There are, however, a few downsides. Since there is no min_id parameter, you need to store all of the previous max_id parameters, so you can find your way back (assuming you run your app off the raw API data and not a local database). Also, I’ve noticed Twitter caps the number of tweets you can retrieve to about 700 or 2 months (I’m not sure if this is a count-based limit or date-based). This is interesting, considering Twitter indicates in the API docs that they limit the number of retrievable to 3,200. They do note this under “pagination limits” using the page and count parameters, but I assumed we would see this same amount for max_id.
Now, another issue with Twitter pagination, and the original reason I decided to write this post, is gaps in the Twitter stream. If your Twitter app caches tweets to a local database, this is for you. Let’s say someone uses your application once, then takes a day off and returns the next day. In that interim, thousands upon thousands of tweets entered the Twitter database, and possibly your user’s stream. The next time this user starts the application, it loads in the latest tweets, but there is still a slew of tweets from the previous day. What do you do?
![]()
Use a combination of the max_id and since_id parameters. Start loading older pages using max_id and when the return is empty, you’ve either caught up to the most recent tweet id from the previous session or you reached Twitter’s limit.
A few of Twitter’s newer methods use cursors for paging, where each result includes two numbers that are used to navigate to the newer or older page. I personally haven’t used any of these methods, so it’s unclear to me whether the numbers are relative or absolute—hopefully, the latter. In any case, I look forward to this implementation in some of the older methods like favorites that doesn’t even have since_id or max_id yet.

