We quantified runtime, communication bandwidth, the size of the precomputed data, and the size of the initial data sharing (Online Methods) for the lung cancer, bladder cancer, and AMD data sets as well as simulated data sets of varying sizes obtained by subsampling the lung cancer data set (for 2K and 5K individuals) or duplicating the AMD data set (for 50K and 100K individuals). Since the number of SNPs differ between the data sets, we normalized all measurements to 500K SNPs for comparison, assuming a linear dependence on the number of SNPs. Lines show the best linear fit for each group. Note that the observed linear trends are not perfect due to the fraction of individuals or SNPs passing quality control being different across different data sets. Overall, our protocol achieves practical runtimes, and all of our performance measures scale linearly with the number of individuals. Phase 1: Quality control procedure. Phase 2: Population stratification analysis (PCA). Phase 3: Association tests.