Included page "clone:stats-clinic-resources" does not exist (create it now)
and similarly use this address to tell us about additional resources that you think other users might find helpful. Please ignore the menus above, they access facilities only available to the site owner.
Basic ideas and techniques
Histograms
http://tinlizzie.org/histograms/
Statistical testing
Where to start with regard to statistical testing of data.
Power, effect size and sample size calculations
The R system/environment/programming language/statistics package
R is an integrated suite of software facilities for data manipulation, graphical display, statistical analysis, calculation and simulation. It handles and analyzes data very effectively and it contains a suite of operators for calculations on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs and data displays. Finally, it is an elegant, object-oriented programming language.
It is freely available, and can be downloaded (for Windows, Mac and Linux platforms) from https://cran.r-project.org/
The same site also gives access to a huge library of free add-on packages.
Some introductions/tutorials for beginners and inexperienced users (best read while sitting at a computer so you can try the examples as you go):
- R for Beginners, by Emmanuel Paradis: http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
- Introduction, Code and Commentary by JH Maindonald - Using R for Data Analysis and Graphics: http://cran.r-project.org/doc/contrib/usingR.pdf
- Simple R - Using R for Introductory Statistics, by John Verzani: http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf
- The R Guide, by W.J. Owen: https://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
- An introduction to R, by Longhow Lam: http://cran.r-project.org/doc/contrib/Lam-IntroductionToR_LHL.pdf
CRAN contributed documentation: http://cran.r-project.org/other-docs.html - page has a lot of useful material. Most of this is written for statisticians and/or programmers so is mostly fairly technical.
Visualisation: ideas and discussion
Statsref: Statistical Analysis Handbook
A more comprehensive web-based handbook, a guide to much standard elementary and intermediate statistical methodology:
http://www.statsref.com/
'Five ways to fix statistics'
A fascinating commentary in Nature: should be read by all scientists: https://www.nature.com/articles/d41586-017-07522-z
As debate rumbles on about how and how much poor statistics is to blame for poor reproducibility, Nature asked influential statisticians to recommend one change to improve science. The common theme? The problem is not our maths, but ourselves.
More advanced techniques
Logistic regression
How to interpret logistic regression output. This might be useful here: https://www.youtube.com/watch?v=ckkiG-SDuV8
Modern alternatives to ANOVA
Slides for a talk on this topic by Jonty Rougier in April 2018, and accompanying R code.
R for Data Science
R for Data Science - Garrett Grolemund & Hadley Wickham
An accessible introduction to creating a 'data science workflow' in R.
Regression with errors in both variables
When your data has observational/measurement error or noise in both x (predictor/covariate/independent variable) and in y (response/dependent variable), ordinary regression techniques like simple linear regression are not really valid. See https://en.wikipedia.org/wiki/Errors-in-variables_models for some discussion. There you will see that you need to think carefully about your assumptions in this situation. For some purposes, so-called Deming regression is appropriate, and this (along with several generalisations you might also consider) is provided in the R package 'deming', see https://cran.r-project.org/web/packages/deming/index.html.
Multilevel modelling
The LEMMA course from the Centre of Multilevel Modelling is a great resource for anyone who needs to do something with this type of modelling:
http://www.bristol.ac.uk/cmm/learning/online-course/
Biostatistics
Frank Harrell's book Biostatistics for Biomedical Research: http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf
In fact, Harrell has a lot of useful resources, notably in Biostatistics, perhaps starting with his blog: http://www.fharrell.com/
Statistical learning
The following MIT open course on artificial intelligence has some great lectures on statistical learning:
An accessible (i.e. for non-mathematically trained researchers) introduction to statistical learning:
An Introduction to Statistical Learning with applications in R - Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani.
Link to website: http://www-bcf.usc.edu/~gareth/ISL/
Free pdf version here: http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
Videolectures.net
A huge library (over 23000 videos!) of generally well-produced generally high-quality advanced-level video lectures, concentrating on machine learning and applications. Use search facility to locate material.
Reproducibility
A presentation on the importance of reproducibility in science, and the role of statistics in promoting this. The local lead of the Bristol Reproducibility Network is Hugo Pedder (hugo.pedder at b**l.ac.uk)
Getting more organised
Why you should be very careful if you use a spreadsheet like Excel for your scientific data
- https://www.theguardian.com/politics/2020/oct/05/how-excel-may-have-caused-loss-of-16000-covid-tests-in-england
- http://andrewgelman.com/2013/04/17/excel-bashing/
- http://www.sciencemag.org/news/sifter/one-five-genetics-papers-contains-errors-thanks-microsoft-excel
- https://science.slashdot.org/story/14/05/27/220202/why-you-shouldnt-use-spreadsheets-for-important-work
- https://www.reddit.com/r/askscience/comments/fifri/when_should_scientists_not_use_excel/
Housekeeping, and maintaining code and data
See http://www.brown.edu/Research/Shapiro/pdfs/CodeAndData.pdf
This is all good advice — I expect most of us are doing most of this already. The thing I deliberately don't do is use version control, which I tried for several years (svn, subversion, git) but didn't like, but then I'm old school. I use synchronised version numbers in file names. The thing I'm not doing but ought to is using a project management tool for my collaborations, instead of relying on email threads.
Misuse/misinterpretation of statistical data
Interesting examples
The following are from the October 2016 issue of ‘Significance':
"Predictive policing systems are used increasingly by law enforcement to try to prevent crime before it occurs. But what happens when these systems are trained using biased data? …"
The article makes a strong and disturbing case:
http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2016.00960.x/full
(The article is available on open access.)
Misuse of statistics by the UK Department of Education features in: "According to the UK's Department for Education, “missing the equivalent of just one week a year from school can mean a child is significantly less likely to achieve good GCSE grades”. Can this really be true?”
http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2016.00959.x/full (One needs to be subscribed to see the full article.)
A bit ironic that the DfE has demonstrated itself so badly educated on matters of use of statistical information.
The only trouble is that they took it to heart and made national policy around it which is collecting masses of fines in England…
"Term-Time Holiday Fines Top £4m As ITV Reveals More Than 60,000 Fines Have Been Issued"
http://www.huffingtonpost.co.uk/entry/term-time-holiday-fines_uk_5808b640e4b07ebc072c56bd