Page Bottom  Documentation

Our database, inc. income at block level

This describes the an integrated database control structure created for use with the 2000 census and the 1990 census. There is a great deal of similarity in content from one census to another, but there are important, sometimes subtle, differences that complicate analysis and comparisons. Retrieval is complicated by the fact that the names for data items differ from one source to another. The structures of tables change, and geographic boundaries have changed from 1990 to 2000.

This system integrates the four censuses and the HUD low-mode database making it possible to easily query for what is available, to choose among sources and to make comparisons.

The text below describes our database structure in greater detail.

The important principles which make this structure work are these. All data tables are given an explicit name which is then enrolled in a "registry" (5001 13). Each name has an abbreviation (Abbr) which is rigidly protected, controlled and not changed. ("Abbr" is not case sensitive; we prefer u/l case while HUD prefers all upper case; these are the same; but in truth we will be religiously careful to preserve our u/l case in all tables we control so that our data retrieval systems are not slowed by the need to be case insensitive). The full "explicit" name will be modified now and then; it is an "attribute". For ease of reading and conversation, we will use as a synonym for "Abbr" the term "DataItem" (no space).

The data items in each source are coded to a "DataItem", adding new DataItems as needed. When we import a file from a source, the particular field names and other attributes assigned by that source remain in the imported file. Our "DataItem" does not appear in the source file. This makes it easy to accept new data. Intgration with our structure occurs outside the source files. The most important place where this integration occurs is in a list which includes the following columns:

  1. Abbr - (DataItem) the highly controlled "Abbreviation" from our Data Registry.
  2. File - any source file which includes this DataItem.
  3. Table - the table in the source file containing this data, if relevant (e.g. SF1, SF3, STF1, STF3).
  4. NFlds - the number of fields in the table (often 1).
  5. Date - if applicable, usually a year (e.g. 1990 census or 2000 census).
  6. Comment - any additional comment useful in understanding how this data source differs from others.

How this facility is used will evolve. Minimal use will be to provide for translation of DataItems from one database to another. Ultimate use might be to provide a central place where all DataItems in a complex database are enrolled. For the time being, we'll use it in the minimal way. If we ever want to, we can write a program that will build it to the ultimate use. Alternatively, we could build the all-inclusive "ultimate" index (including all databases enrolled) on the fly whenever it is needed. That would simplify maintenance of the core translater.

The "integrated database control structure" allows one to approach the database asking "where can I find data on my subject". But often, the databases are accessed with the source files already chosen during an initialization procedure. When seeking information in a file that has already been chosen, first we try to find file location in simplest way . Then try using RA[2] as a field name or key word. If that fails, then look up RA[2] in the integrated database structure. If the entire array is not desired, RA[3] is used to select elements (or columns or rows).

The "integrated database control structure" is oriented to tables. However, sometimes a portion of a table needs to be enrolled in the common registry. At the moment, "cf‘fn" is our way of doing that.

horizontal line
to home page e-mail Page Top