Thursday 23 October 2008

Archiving your timeline with Tweet-SQL

This TSQL script uses Tweet-SQL to archive all of the authenticating users tweets to a local SQL Server table. If you have a large number of Tweets to archive you may wish to break up the archiving by keeping track of the page number you are requesting. You can also modify the WAITFOR DELAY value to avoid hitting the API limit.

First create a table to hold our timeline....

   1:  CREATE TABLE TwitterArchive
   2:  (
   3:      created_at NVARCHAR(30),
   4:      id INT PRIMARY KEY,
   5:      [text] NVARCHAR(140),
   6:      source NVARCHAR(100),
   7:      truncated NVARCHAR(5),
   8:      in_reply_to_status_id INT,
   9:      in_reply_to_user_id INT,
  10:      favorited NVARCHAR(5),
  11:      status_Id INT,
  12:      statuses_Id INT                                            
  13:  );


This script will archive all of the authenticating users timeline a page at a time. Make adjustments to this script as your situation dictates.

   1:  DECLARE @xml XML,
   2:          @handle INT,
   3:          @page INT,
   4:          @optional NVARCHAR(10),
   5:          @rowcount TINYINT;
   6:   
   7:  SET @page = 1;
   8:   
   9:  -- Turn of resultsets from Tweet-SQL
  10:  EXEC dbo.tweet_cfg_resultset_send 0;
  11:   
  12:  -- Set the optional parameter to request page 1
  13:  SET @optional = '?page=' + CAST(@page AS NVARCHAR(4));
  14:  -- Get page 1 of your timeline
  15:  EXEC dbo.tweet_sts_user_timeline null, @optional, @xml OUTPUT;
  16:  SET @rowcount = 20;
  17:   
  18:  WHILE (@rowcount = 20) -- While we still have results to deal with
  19:  BEGIN
  20:      -- Prepare an xml document
  21:      EXEC sp_xml_preparedocument @handle OUTPUT, @xml;
  22:   
  23:      -- Insert the page into a table
  24:      INSERT INTO dbo.TwitterArchive
  25:      (
  26:          created_at,
  27:          id,
  28:          [text],
  29:          source,
  30:          truncated,
  31:          in_reply_to_status_id,
  32:          in_reply_to_user_id,
  33:          favorited,
  34:          status_Id,
  35:          statuses_Id    
  36:      )
  37:      SELECT  created_at,
  38:              id,
  39:              [text],
  40:              source,
  41:              truncated,
  42:              in_reply_to_status_id,
  43:              in_reply_to_user_id,
  44:              favorited,
  45:              status_Id,
  46:              statuses_Id
  47:      FROM OPENXML (@handle, '/statuses/status', 2)
  48:      WITH
  49:      (
  50:          created_at NVARCHAR(30),
  51:          id INT,
  52:          [text] NVARCHAR(140),
  53:          source NVARCHAR(100),
  54:          truncated NVARCHAR(5),
  55:          in_reply_to_status_id INT,
  56:          in_reply_to_user_id INT,
  57:          favorited NVARCHAR(5),
  58:          status_Id INT,
  59:          statuses_Id INT
  60:      );
  61:   
  62:      -- Get the rowcount, when < 20 then we have the last page
  63:      SET @rowcount = @@ROWCOUNT;
  64:      -- Which page have we done?
  65:      PRINT 'Archived page ' + CAST(@page AS NVARCHAR(4));
  66:      -- destroy the xml document
  67:      EXEC sp_xml_removedocument @handle;
  68:      -- Increment the page count
  69:      SET @page = @page + 1;
  70:      -- Setup the optional parameter
  71:      SET @optional = '?page=' + CAST(@page AS NVARCHAR(4));
  72:      -- Wait for a bit...
  73:      WAITFOR DELAY '00:00:05';
  74:      -- Get the next page
  75:      EXEC dbo.tweet_sts_user_timeline null, @optional, @xml OUTPUT;
  76:  END
  77:   
  78:  -- Turn resultsets from Tweet-SQL back on as appropriate
  79:  EXEC dbo.tweet_cfg_resultset_send 1;


SQL Server Management Studio will return something like the below on completion.

200 OK: everything went awesome.

(20 row(s) affected)
Archived page 1
200 OK: everything went awesome.

(20 row(s) affected)
Archived page 2
200 OK: everything went awesome.

(20 row(s) affected)
Archived page 3
200 OK: everything went awesome.

(20 row(s) affected)
Archived page 4
200 OK: everything went awesome.

(20 row(s) affected)
Archived page 5
200 OK: everything went awesome.

(20 row(s) affected)
Archived page 6
200 OK: everything went awesome.

(20 row(s) affected)
Archived page 7
200 OK: everything went awesome.

(9 row(s) affected)
Archived page 8
200 OK: everything went awesome.

Blogged with the Flock Browser

No comments: