Importing Large Stata Files

Recently I encountered a problem when trying to use a large Stata file (nearly 10 gb). The file contained data for the period 1981 to 2011, but I only needed data for the period 1991 to 2009. To complicate matters, initially, I didn’t even know the names of the variables in the file, a problem that can be resolved with: 

type "filename"

In this case, it turns out that knowing the variable names proved unimportant. Instead, after a bit of trial and error, I ended up importing batches of observations (1 million observations at a time). Below is the code for several such batches.

*STEP 1
clear
use "1980-2011.dta" in 8000001 / 9000000
gen pct = round((shares / outstanding),.01)
keep if pct >= .05 & pct != .
compress
save blockholders , replace

*STEP 2
clear
use "1980-2011.dta" in 9000001 / 10000000
gen pct = round((shares / outstanding),.01)
keep if pct >= .05 & pct != .
compress
append using blockholders
save blockholders , replace

*STEP N

Step 1 imports a chunk of 1 million observations, and keeps only those in which an investor owns 5% or more of a particular company. About 22,000 out of one million observations meet this criterion. These ~22,000 observations are saved. In Step 2, the procedure is repeated, at which point another ~22,000 qualifying observations are appended to the blockholders file, and the file is saved again. Finally, the procedure is repeated N times until all the observations have been evaluated and only those relevant to my research project have been retained.