Importing Large Stata Files

Recently I encountered a problem when trying to use a large Stata file (nearly 10 gb). The file contained data for the period 1981 to 2011, but I only needed data for the period 1991 to 2009. To complicate matters, initially, I didn’t even know the names of the variables in the file, a problem that can be resolved with: 

type "filename"

In this case, it turns out that knowing the variable names proved unimportant. Instead, after a bit of trial and error, I ended up importing batches of observations (1 million observations at a time). Below is the code for several such batches.

*STEP 1
clear
use "1980-2011.dta" in 8000001 / 9000000
gen pct = round((shares / outstanding),.01)
keep if pct >= .05 & pct != .
compress
save blockholders , replace

*STEP 2
clear
use "1980-2011.dta" in 9000001 / 10000000
gen pct = round((shares / outstanding),.01)
keep if pct >= .05 & pct != .
compress
append using blockholders
save blockholders , replace

*STEP N

Step 1 imports a chunk of 1 million observations, and keeps only those in which an investor owns 5% or more of a particular company. About 22,000 out of one million observations meet this criterion. These ~22,000 observations are saved. In Step 2, the procedure is repeated, at which point another ~22,000 qualifying observations are appended to the blockholders file, and the file is saved again. Finally, the procedure is repeated N times until all the observations have been evaluated and only those relevant to my research project have been retained.

Skype Recorder

I’ve been looking for a way to record Skype conversations. In particular, when conducting research interviews I like to be able to wear a headset so as to have my hands free for typing notes. Several months ago I tried Call Graph, which seemed like a promising solution. However, I could never get it to work without introducing an echo and feedback loop that was audible on both sides of the call. Last week I tried out a free application called MP3 Skype Recorder v1.9.0.1 and so far it has been working well for me.

MP3 Skype Recorder
MP3 Skype Recorder

Penn State, Gmail and SMTP

I’ve been using Gmail instead of Outlook for about a year now. However, despite configuring Gmail to list my “from” address as @psu.edu, some recipients were still seeing my messages come through as “on behalf of” first.last@gmail.com. Although bothersome, I just never got around to figuring it out.

Today I finally decided to see if it could be fixed. It turns out Google has addressed this problem. First, read this Gmail blog post. For most users, this should be enough. However, in the case of my Penn State account, it took some digging to find the settings. In particular, for Penn State you will need to use authsmtp.psu.edu (and not smtp.psu.edu) as your outgoing mail server. Also, use port 587. However, contrary to the instructions in the Penn State knowledge base, do NOT check the box for SSL encryption.

Favorite Netbook Applications

In anticipation of a summer of traveling — vacation in Portland, Eugene and Cannon Beach, Oregon; and academic conferences/presentations in Barcelona, Florence, and Chicago — I bought a a Samsung NC10 Netbook so as to avoid lugging around my albatross Dell Latitude laptop.  Not only is the Samsung Netbook perfect for traveling, I love it so much that I now use it almost daily.  Other than adding a 2 GB memory upgrade (from Crucial of course), my unit is stock (Win XP Home, etc).  The battery life is amazing (7-8 hours).  The nearly-full-sized keyboard works great.

As part of the transition from laptop to netbook I decided it was time to get rid of some of the bloatware applications I have long taken for granted.  By bloatware I am referring to applications that chew up lots of computing resources or lots of economic resources — or both.

My first worry was keeping my email in sync across two laptops.  Since most of my email is picked up using POP and not IMAP, I knew keeping two copies of Outlook in sync would be a pain.  But as I longtime Outlook user I was very reluctant to consider giving it up.  After considering a bunch of options I decided on Gmail.  I had signed up for a Gmail account circa 2005, but had not opened it in years.  After about 4 months of using Gmail exclusively I have become a total Gmail convert.  I do not miss Outlook at all.  More impressive, I even managed to use Gmail’s IMAP support to push my most recent 2 years (about 2 GB) worth of email from Outlook to Gmail, putting it all in the cloud for easy access anytime, anywhere.  Another plus is the Gmail “offline” feature which allows you to compose and reply to emails offline — just like Outlook.  Similarly, I now use Google Contacts and Calendar.

My second biggest concern was having access to all my files from both laptops — and easily keeping them in sync.  For this I decide on a program called Windows Live Sync.  This free application (formerly called Windows Live Foldershare)  does an amazing job of synchronizing several gigabytes and several thousand files across both machines. Now, I can work on any document I want from either machine.  The only thing better would be having all my documents in the cloud, but I could not find any free services offering enough storage space.

Some other new found favorites:  For browsing I use Google Chrome exclusively.  Not only is it the thinnest and lightest internet client available, it also takes up the least screen space.  With only 600 vertical pixels on my Netbook screen every pixel counts — and Chrome lets me see more of the pages I am browsing than Explorer, Firefox, etc. I also replaced Adobe Acrobat Reader (something like 90 MB) with Foxit Reader (closer to 5 MB).  For virus protection after the 90 day free trial of Symantec expired I switched to AVG Antivirus Free Edition.  For opening and creating file archives I use 7-Zip, an open source application which not only reads .ZIP files, but also .RAR and about a dozen others.  For FTP I use another open source package called FileZilla.  Finally, for photo management I love Picasa 3 and its companion Web Albums. And for photostitching I found a terrific free program called Autostitch.  For example, this panoramic picture of the Villa La Pietra gardens stitches together 9 different photographs.

PDF and Kindle

After much trial and error I have finally discovered a way to read PDFs of academic articles and other documents with technical formatting on my Kindle 2.

The solution is PDFRead 1.8.2. To get started, first download the “pdfread-1.8.2-Installer.zip” file available here. Second, after downloading and installing PDFRead, you **must** also install the file “NRhtml2mobi.exe” as described in the 8-Mar-2009 edit to the above post.

Third, in terms of settings, I recommend the same settings proposed here and shown below.

PDFRead 1.8.2 Kindle Settings
Recommended PDFRead 1.8.2 Kindle Settings

Fourth, after using PDFRead to convert a PDF to a PRC file, simply connect your Kindle to your desktop via USB and drag and drop the file to your Kindle’s “documents” folder. Happy reading.

BTW, in case you are wondering, before arriving at this solution I tried and rejected Amazon’s PDF conversion service, MobiPocket Creator (a company owned by Amazon), and Jesse Vincent’s Savory (Version .10, an open source Kindle hack that converts PDFs on the fly). None of these solutions worked for me. By that I mean the formatting of academic journal articles was not maintained seamlessly from PDF to Kindle — at least not to my satisfaction.  In short, as far as I can tell, at this point the only solution better than PDFRead would be if Amazon provided native PDF file support on the Kindle.