π GitHub Repository | π Codebook generator Web-App
Codebooks / data dictionaries
Also known as data dictionaries, codebooks are essential to describe the contents, structure, and layout of a dataset. This ensures proper documentation, and further understanding and reuse by other researchers as a reference for data analysis and interpretation.
Key Components of a Codebook
As a document (table level) artifact, a codebook defines the variables of a dataset as clearly as possible. Please consider the following attributes:
Variable Name: A unique identifier for the variable name on the data table (e.g., EMPLOY1 or VAR001).
Variable Label: A brief disciplinary description of the variable (e.g., βEmployment Statusβ).
Variable type: Indicates the type of variable (e.g numeric, integer, character, boolean).
Ranges or labels: Contains the range or variable levels depending on the type (e.g β0-100β, βLevels = A1, A2, A3β.).
Missing values: Indicates the number (if any) of missing variables for each column.
Units: Measurement units for the variable (e.g., βcentimetersβ, βsquared metersβ).
Depending on the discpline, more attributes could be described to make the dataset understandable. Crystal Lewis offers codebook examples.
How to create a codebook
Creating codebooks is a good research practice that should be implemented during the research process. Keep the format as simple as possible. The web-based codebook generator allows the user to download a .CSV codebook derived from a given data table.
Example of a codebook
| Variable | Label | Type | Range-Levels | Missing values |
|---|---|---|---|---|
| Stage | Experimental stage | Factor | 1, 2, 3, 4 | NA |
| Intervention | Intervention Group | Factor | G1, G2, G3 | NA |
| Age | Participant age | Numeric | 18-26 | 1 |
| Sex | Biological sex | Factor | Men, Women | NA |
| Score | Cognitive score | Numeric | 1-20 | NA |
Codebooks are crucial for research transparency, reproducibility, and long-term data preservation.
