Table 10Technical Planning Questions

  • Who is performing the linkage? Are the individuals performing the linkage permitted access to identifiers or restricted sets of identifiers? Are they neutral agents (“honest brokers”) or the source of one of the datasets to be linked?
  • How easy will it be to know whether a given person is in the registry? Are censuses riskier than surveys?
  • Is there a common feature or pseudonym (sets of attributes in both databases that are unique to individuals but do not lead to re-identification) available across the datasets being linked?
  • Is the registry a flat file or a relational database? The latter is more difficult to manage unless a primary key is applied.
  • Is the registry relatively static or dynamic? The latter is harder to manage if data are being added over time, because the risk of identification increases.
  • How many attributes are in the registry? The more attributes, the harder it will be to manage the risk of identification associated with the registry.
  • How will conflicting values of attributes that are common to both databases be resolved? Comparable attributes (e.g., weight) should be converted to the same units of measurement in datasets that will be linked.
  • Does the registry contain information that makes the risk identification intrinsic to the registry? Direct identifiers such as names and Social Security Numbers are problematic, as is fine-scale geography.
  • Is there a sound data dictionary?
  • How many external databases will be linked to the registry data? How readily available and costly is each external database?
  • How will records that appear in only one database be managed?
  • How will the accuracy of the linked dataset relate to the accuracy of its components? The accuracy is only as good as that of the least accurate component.

From: Chapter 7, Linking Registry Data: Technical and Legal Considerations

