Here are additional best practices to consider when creating tabular data:
8. Fill in All Cells
Do not leave empty cells because if you have empty cells in the columns, you will likely end up distorting your dataset and that may result to inaccurate data. To prevent empty cells in Excel Column see this link. See good vs. bad practice below. We use a table from Zwar’s (2021) study as an example.
Good Practice:
Outcome Variables | Devaluing Feelings | Appreciative Feelings | Accusing Statements | |||
working | non-working | working | non-working | working | non-working | |
Caregiver’s gender (Ref. female) | 0.02 | 0.01 | −0.04 | −0.01 | 0.04 | 0.05 |
Constant | 1.63 | 1.81 | 3.19 | 3.30 | 1.83 | 2.07 |
Observations | 515 | 513 | 512 | 515 | 511 | 516 |
R2 | 0.071 | 0.076 | 0.048 | 0.076 | 0.027 | 0.016 |
Bad Practice
Outcome Variables | Devaluing Feelings | Appreciative Feelings | Accusing Statements | |||
working | non-working | working | non-working | working | non-working | |
Caregiver’s gender (Ref. female) | 0.02 | −0.04 | −0.01 | 0.04 | 0.05 | |
Constant | 1.63 | 3.19 | 3.30 | 1.83 | ||
Observations | 515 | 512 | 515 | 511 | 516 | |
R2 | 0.071 | 0.076 | 0.048 | 0.027 | 0.016 |
9. Create a Data Dictionary
Data dictionaries describe each variable in data tables. In your data tables, include a header row with a short name (without spaces) for each variable. Use the data dictionary to link this short name to a longer text label for each variable, a description of the data, data type and possible values (such as “integer” or “string”), and units of measurement. See good vs bad practice below. Example from Glenn (2020).
Good Practice:
PatientId | PatientAge | PatientSex | RiskFactors |
1 | 34 | M | Obesity |
2 | -999 | M | Cancer |
3 | 45 | F | Cancer |
4 | 38 | M | Smoking |
5 | -999 | M | NULL |
6 | 39 | F | Obesity |
7 | 48 | F | Smoking |
Data Dictionary
Variables | Definition | Type of Data | Possible Values |
Patient Age | Age of Users | Integer | 30-50 |
Patient Sex | Sex of patient | String | M, F |
Risk Factor | Risk factor classification of patient | String | Obesity; Cancer; Smoking; NULL |
Bad Practice:
PatientId | PatientAge | PatientSex | RiskFactors |
1 | 34 | M | Obesity |
2 | -999 | M | Cancer |
3 | 45 | F | Cancer |
4 | 38 | M | Smoking |
5 | -999 | M | -999 |
6 | 39 | F | Obesity |
7 | 48 | F | Smoking |
10. Save the Data in Plain Text Files
Keep a copy of the data files in a plain text format, with comma or tab delimiters. You can use (CSV) files (Broman & Woo, 2018) – researchers ought to do this because it helps to interchange data between programs with two different architectures. See an example spreadsheet and screen capture of a comma-separated values (CSV) formatted file. We use a table from Zwar’s (2021) study as an example.
Outcome Variables | Devaluing Feelings | Appreciative Feelings | Accusing Statements | |||
working | non-working | working | non-working | working | non-working | |
Caregiver’s gender (Ref. female) | 0.02 | 0.01 | −0.04 | −0.01 | 0.04 | 0.05 |
Constant | 1.63 | 1.81 | 3.19 | 3.30 | 1.83 | 2.07 |
Observations | 515 | 513 | 512 | 515 | 511 | 516 |
R2 | 0.071 | 0.076 | 0.048 | 0.076 | 0.027 | 0.016 |
11. No Calculations in the Raw Data Files (Broman & Woo, 2018)
The best strategy here is to make a copy of your files and do your calculations in the copy. Raw data should present the original collected data free from interpretation and analysis. In the ‘bad’ example below, we see the table has been shared with a scale fully computed and the raw data from each scale item is missing.
Good Practice:
Id | Gender | Age | Personality1 | Personality2 | Personality3 | Personality4 | Personality5 |
1 | F | 18 | 4 | 3 | 5 | 4 | 4 |
2 | F | 18 | 3 | 3 | 4 | 4 | 3 |
3 | M | 19 | 2 | 4 | 2 | 4 | 3 |
4 | M | 18 | 5 | 4 | 1 | 2 | 4 |
Bad Practice:
Id | Gender | Age | Personality-scale-score-for-5-extraversion-items |
1 | F | 18 | 20 |
2 | F | 18 | 17 |
3 | M | 19 | 15 |
4 | M | 18 | 16 |
12. Make it a Single Big Rectangle
The best layout for your data within a spreadsheet is as a single rectangle with participants/samples/patients in rows and variables in columns (Broman & Woo, 2018). Below, see an example of a spreadsheet with non-rectangular layouts from Broman & Woo (2018) studies.
Good Practice:
A | B | C | D | E | F | |
1 | weight | 100 | 93 | 99 | 87 | 78 |
2 | sex | male | female | male | male | male |
3 | glucose | 134 | 120 | 124 | 83 | 105 |
4 | insulin | 0.60 | 1.18 | 1.23 | 1.16 | 0.73 |
Bad Practice:
A | B | C | D | E | F | |
1 | ||||||
2 | 101 | 102 | 103 | 104 | 105 | |
3 | sex | male | female | male | male | male |
4 | ||||||
5 | 101 | 102 | 103 | 104 | 105 | |
6 | glucose | 134 | 120 | 124 | 83 | 105 |
7 | ||||||
8 | 101 | 102 | 103 | 104 | 105 | |
9 | insulin | 0.60 | 1.18 | 1.23 | 1.16 | 0.73 |