Writing about Statistical Methods

The Methods Section

Overall, the methods section of a paper includes relevant project details such that the study can be replicated.
- Replication for both the subject-matter expert and the person performing data analysis.
The subsections we expect to contribute the most to:
- Data Management (optional section)
- Statistical Analysis
- Supplementary/Appendix Materials (optional section)

Data Wrangling

The data wrangling portion of the methods section should outline how the analysis dataset was created from the raw dataset(s).
- This is why I maintain that we should do all data management with programming – we have a record of everything, from start to finish.
Inclusion/exclusion criteria should be outlined.
- “Participants with missing data on key covariates were excluded from analysis.”
- “The analysis data was restricted to those in the ‘first time in college’ cohort.”
Creation of new variables should be outlined.
- “Hypertension was defined as systolic blood pressure > 140, diastolic blood pressure > 90, or use of antihypertensive medications.”

Order Matters

List methodology in the order in which results are presented.
Typically the order is:
- Descriptive statistics (Table 1)
- Research Question 1 (e.g., Table 2)
- Research Question 2 (e.g., Table 3)
- ….
- Software utilized
  - Data management
  - Statistical analysis
  - Graphing
- A priori significance level

Appearance in Paper Matters

If a table/graph (or analysis) is not discussed in the paper, it should not be included in the paper.
This also extends to the methods section: we only describe methods that are used in the paper.

Table 1

Table 1 (descriptives)
- Descriptive statistics are shown as [mean/median] ([standard deviation/IQR]) for continuous variables or n (%) for categorical variables.
  - These tables are sometimes split by grouping variables.
  - If comparing groups in your Table 1, indicate what test(s) were used to compare the groups.
  - Note: there is some debate about using p-values in Table 1… but the idea is to show the audience what differences may or may not exist.

Tables 2 and Beyond

The tables included after Table 1 typically correspond to research questions/hypotheses.
For each table, indicate what outcome was modeled, what method was used to model it, and any relevant notes about the modeling.
- Were any transformations used?
- Were any covariates adjusted for?
  - Only necessary to note when tables showing model results exclude adjustors.
- Were assumptions met?
  - Normality of residuals & equal variances
  - Proportional odds or hazards
  - Multicollinearity

Tables 2 and Beyond

The tables included after Table 1 typically correspond to research questions/hypotheses.
We also want to explicitly state how the results are displayed in the tables.
- My preference: the coefficient as we will interpret it and its corresponding 95% CI.
  - Normal: \hat{\beta} (95% CI)
  - Gamma: \exp\{\hat{\beta}\} (95% CI)
  - Logistic: Odds Ratio (95% CI)
  - Poisson: Incidence Rate Ratio (95% CI)
  - Cox: Hazard Ratio (95% CI)

The Technical “Details”

Finally, we include technical details about how the data was handled during the analysis.
We always state (and cite) the software used to produce information for the paper.
- Yes, sometimes I use different programs for data management, statistical analysis, and graphing….
Every software has its preferences on how use should be cited in papers.
- SAS software, version [version number] (SAS Institute, Cary, NC).
- Stata statistical software, release [release number] (StataCorp, College Station, Texas).
- R version [version number] (R Foundation for Statistical Computing, Vienna, Austria).

The Technical “Details”

YES, we will cite the packages used in addition to base R!
- This is the nice thing to do – we recognize those that made our lives easier with programming.
- But also, this is how developers (academic) can quantify and demonstrate the usefulness of their contributions to the R community (for annual evaluation purposes).
For base R:

citation()

For packages:

citation("package_name")

The Technical “Details”

citation()

To cite R in publications use:

  R Core Team (2025). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2025},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

The Technical “Details”

citation("tidyverse")

To cite package 'tidyverse' in publications use:

  Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
  Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
  E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
  Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
  the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
  doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Welcome to the {tidyverse}},
    author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
    year = {2019},
    journal = {Journal of Open Source Software},
    volume = {4},
    number = {43},
    pages = {1686},
    doi = {10.21105/joss.01686},
  }

The Technical “Details”

citation("bayesrules")

To cite bayesrules package in publications use:

  Mine Dogucu, Alicia Johnson, Miles Ott (2021). bayesrules: Datasets
  and Supplemental Functions from Bayes Rules! Book Retrieved from
  https://github.com/bayes-rules/bayesrules R package version 0.0.2.900

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {bayesrules: Datasets and Supplemental Functions from Bayes Rules! Book},
    author = {Mine Dogucu and Alicia Johnson and Miles Ott},
    year = {2021},
    url = {https://github.com/bayes-rules/bayesrules},
    note = {R package version 0.0.2.9000},
  }

The Technical “Details”

Finally (finally!), we should state the a priori significance level if we are analyzing under the frequentist framework.
If we are analyzing under the Bayesian framework, that will be obvious when we describe our modeling approach.

Example 1: Table 1

Descriptive data are shown as median (range) or n (%), as appropriate. What numbers are shown in the cells of the table?
Continuous variables were compared using the Kruskal-Wallis test. Categorical variables were compared using the \chi^2 or Fisher’s exact tests, as appropriate. How were p-values generated?

Example 1: Table 2+

Hospital mortality and use of extracorporeal membrane oxygenation were modeled using logistic regression. What outcomes were modeled and how?

Example 1: Table 2+

Regression results are shown as odds ratio (95% confidence interval; CI). What are in the cells of the table?

Example 1: Table 2+

Significance of predictors was assessed using the omnibus Wald \chi^2 test. When appropriate, pairwise comparisons were made using the Bonferroni adjustment. How were p-values generated/evaluated?

Example 1: Table 2+

Note the informative footnote!
- I try to include information in footnotes as a reminder for the reader.

Example 1: Pulled together

Descriptive data are shown as median (range) or n (%), as appropriate. Continuous variables were compared using the Kruskal-Wallis test. Categorical variables were compared using the \chi^2 or Fisher’s exact tests, as appropriate.
Hospital mortality and use of extracorporeal membrane oxygenation were modeled using logistic regression. Regression results are shown as odds ratio (95% confidence interval; CI). Significance of predictors was assessed using the omnibus Wald \chi^2 test. When appropriate, pairwise comparisons were made using the Bonferroni adjustment.
Data management and analysis were performed using SAS software, Version 9.3 (SAS Institute, Inc., Cary, NC). A priori significance was defined as p < 0.05.

Example 2: Table 2+

Sometimes we decide to show results for a subset of predictors.

Example 2: Table 2+

We just need to make sure that information is noted somewhere.
- Although the information is in the footnote, the models being compared/contrasted must be clearly outlined in the methods section.

Example 3: Table 2+

Correlations between covariates at initial survey were assessed using Spearman’s correlation. What method was used to assess correlations?

Example 3: Table 2+

Let’s take a look at the table description and footnotes…

Example 3: Table 2+

Hierarchical linear models (HLM) were used when modeling all outcomes of interest. First, the models had only sex, type of SCD, and the baseline sleep quality score as predictors. Then, the models were further ajdusted for depression score. Finally, the models were further adjusted for pain frequency, intensity, and interference. Because all HLM incrementally increased the R² value, only the fully adjusted models are shown.

Example 3: Table 2+

Sleep quality and age appropriate sleep were modeled using binary logistic regression while sleep duration was modeled using ordinary least squares regression. Regression results are shown as odds ratio (95% confidence interval; CI) for logistic regression and unstandardized beta coefficients (95% CI) for linear regression. Significance of predictors was assessed using the omnibus Wald \chi^2 test for categorical predictors. As necessary, pairwise comparisons were made using the Bonferroni adjustment.

Example 4: Supplementary Materials

Sometimes, we have additional analyses that are not central to the main results and can be cut from the main body of the paper for either space or interpretability purposes.
- sensitivity analysis
- full model results, including results from adjustors
- recoding during data management (e.g., collapsing categories)
Example of recently-published supplement: click here.

Example 4: Supplementary Materials

Example of recently-published supplement: click here.
The original rule still applies, though: if there is no mention of it in the paper, it should not be included in the supplementary materials.
- e.g., Likert-scale questions were condensed into three categories: agree, neutral, and disagree. Full details on the recoding process, including summary statistics, are shown in Appendix A.
- e.g., Results shown in Table 2 focus on the primary predictors of interest. Full model results, including adjustors, are shown in Appendix B.

Example 5: Writing More About Methods

Depending on the field, more information about methods may be necessary to help the reader interpret the results. Consider this graph:

Example 5: Writing More About Methods

What did the resulting statistical methods section look like?

Example 5: Writing More About Methods

Why are we talking about this? (Adding details for interpretations.)
You will be expected to provide basic examples of how to interpret coefficients in the model(s) you are using in this analysis.
Again, only include what is necessary to understand the results.
- Goal: give the reader the background necessary to understand what they’re looking at.
- Actual goal: reproducibility!

Other Notes

When working with others & disagreeing on what should be included, I will ask myself, “is this my hill to die on?”
Battles I have won:
- “Multivariable” vs. “Multivariate”
- Including interval estimates in tables
- Excluding/including p-values in Table 1
Battles I have not won:
- “Multivariable” vs. “Multivariate”
- Including interval estimates in tables
- Excluding/including p-values in Table 1
Actual hill to die on: posthoc power analysis (1, 2).

Other Notes

I do not write this section until after the results section is finalized and/or I know exactly what tables and graphs are being included in the paper.
Note that generally, tables take the same appearance. I make sure we are consistent with how results are displayed across tables in a paper, which also helps with writing the methods section.
- Think: your brand in science.
However, sometimes a journal has requirements for tables. Fields have different ways they present model results.
- My approach: check target journal guidelines; look at recent papers in target journal to see if there is a common theme for tables of results.

Wrap Up

Today we have talked about the basics of writing a statistical methods section corresponding to the analysis you performed.
While this discussion is in the academic context, you can still employ these methods outside of academic papers:
- Think of this as leaving yourself (or your colleagues) notes on how the analysis was performed.
- Again, our real goal here is to make our analyses reproducible.

Upcoming Schedule

Wednesday:
- Individual students are meeting with Drs. Seals and Schmutz.
- Make sure you sign up for a time.
- Meeting time is limited to 5 minutes.
Next deliverables:
- Internal analysis worksheet.
- External analysis worksheet.
- Email to collaborator with external analysis worksheet.
Reminder 1! No class next week (Thanksgiving holiday).
Reminder 2! No formal meetings the week after.
- Monday (December 1): meet with EES collaborator.
- Wednesday (December 3): work on final draft of research; Dr. Seals available by appointment during class time.