Assignment | # of graded assignments | Weight each | Weight total |
---|---|---|---|
In-class participation | 10 | 2% | 20% |
In-class activities | 10 | 3% | 30% |
Paper discussion | |||
Oral presentation | 1 | 8% | 8% |
Paper summary | 1 | 8% | 8% |
Peer review | 1 | 4% | 4% |
Case study | |||
Written report | 1 | 8% | 8% |
Oral presentation | 1 | 4% | 4% |
Rigor/validity of data analysis | 1 | 8% | 8% |
Validity and efficiency of code | 1 | 7% | 7% |
Project organization | 1 | 3% | 3% |
STAT 778 — Statistical Computing
Administrative
- Course meetings: Wednesdays, Jan 22 – Apr 30 (final exam period: May 7). No class on March 14 (spring break)
- Class time: 7:20 – 10:00 P.M. in Innovation Hall 203
- Virtual classes are on Zoom (Meeting ID: 986 5225 7333| Passcode: 344605).
- Instructor: Dr. David Kepplinger (he/him/his)
- Email: dkepplin@gmu.edu
- Office: Nguyen Engineering Building (ENGR), Room 1711
- Office hours: Thursday, 3 – 4 PM on Zoom (Meeting ID: 986 5225 7333| Passcode: 344605)
- Canvas course page: https://canvas.gmu.edu/courses/28879
Course description
Overview
The following topics will be covered in some detail.
- Advanced coding in in R and python
- Writing code in R with the “tidyverse” and python
- Common data structures
- Parallel computing
- Linear algebra routines and their applications in statistics
- PCA and dimension reduction techniques
- Errors induced by numerical and computational approximations.
- Stochastic simulation
- Uniform and non-uniform random number generation
- Monte Carlo methods
- resampling techniques for inference
- Principles of parallel computing for statistical analyses
- Numerical integration, optimization and root finding
- Bayesian analysis
- Mixed effects models
- Cross-validation and hyperparameter selection
Learning outcomes
- Conducting stochastic simulations efficiently
- Adapt and implement resampling methods for statistical inference
- Understanding the sources of errors in statistical computations
- Using parallelization and high-performance computing clusters for stochastic simulations
- Understanding the applications of linear algebra in statistical computations and how to efficiently utilize them
Prerequisites
STAT 652 and STAT 672 or equivalent; you must be comfortable with reading and writing R code and have working knowledge of functional programming.
Textbooks
The main textbooks for this course are
- Gentle, & Härdle, W. K., & Mori Y. (2012). Handbook of Computational Statistics (2nd edition). Springer. Available online.
- Lange. (2010). Numerical Analysis for Statisticians. Springer New York. doi:10.1007/978-1-4419-5945-4. Available online.
- Eubank, & Kupresanin, A. (2011). Statistical Computing in C++ and R (1st edition). Chapman and Hall/CRC. doi:10.1201/b11538. Available online.
- Wickham. (2019). Advanced R. Chapman and Hall/CRC. doi:10.1201/b17487. Available online.
- Givens, & Hoeting, J. A. (2013). Computational Statistics (2nd edition). Wiley. Available online.
The following reference books are also helpful resources for assignments and class discussions:
- Phillips. (2018). Python 3 Object-Oriented Programming (3rd edition). Packt Publishing. Available online.
- Stroustrup. (2014). Programming: Principles and Practice Using C++ (2nd Edition). Addison-Wesley Professional. Available online.
- Meyers. (2014). Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14. O’Reilly. Available online.
- Eddelbuettel. (2013). Seamless R and C++ Integration with Rcpp. Springer New York. Available online.
A number of relevant articles will be posted on Canvas as different topics are discussed.
Logistics
The class is scheduled as a face-to-face meeting on-campus, with several classes being conducted virtually over Zoom. Please see the detailed schedule on the Canvas page for details.
All learners taking courses with a face-to-face component are required to follow the university’s public health and safety precautions and procedures outlined on the University’s Safe Return to Campus webpage. If the campus closes, or if a class meeting needs to be canceled or adjusted due to weather or other concern, learners should check the Canvas course for updates on how to continue learning and for information about any changes to events or assignments
Communications
The Canvas site for this course is the primary channel of communication. Please check the Canvas course regularly for updates! Information posted on the Canvas site includes
- announcements,
- lecture notes,
- in-class activities, homework assignments, project instructions,
- changes to the posted office hours,
- handouts and readings.
Any question related to concepts and topics should be asked on the Canvas discussion board (under Discussions > Course Q&A). Questions will be visible to all registered students, and everyone should actively participate in answering questions posted by peers. Active participation in answering questions will be counted towards the participation grade.
E-mail communication must be restricted to questions relating to sensitive and confidential information (such as grade concerns, personal circumstances requiring specific accommodations, etc.).
- E-mails will be returned within 2 business days and may not be returned on weekends/holidays.
- When you send an e-mail to me, please put STAT 778 at the beginning of the subject line.
- E-mails related to this course must be sent and received via your Mason e-mail account. E-mails sent from other e-mail accounts may not be answered. (This is a university policy and part of your guaranteed rights under FERPA.)
- E-mails with questions that should be posted to our course Q&A may not be answered.
Should you have concerns that you may not be able to fully participate or engage in any of the activities listed below, please do not hesitate to contact me either by e-mail or speak to me in person during office hours or after class. We can discuss alternative arrangements that suit your needs.
Hardware requirements
We will frequently use laptop computers for in-class activities. Please be respectful of your peers and your instructor and do not engage in activities that are unrelated to the class.
Software requirements
This class will use the following interpreters and programming environments:
- R (version 4.4 or higher; available from CRAN)
- RStudio IDE (version 2024.12.0 or newer; available from Posit Co.)
- python (version 3.10 or newer) with the miniconda package- and environment manager
- A Git client on your computer. This course will be taught using SourceTree available free as download. Alternatively, you may choose to use a different interface for Git, such as GitHub Desktop, the UI built into RStudio Desktop, or the git command line interface.
Activities and assignments in this course may sometimes use web-conferencing software (Zoom). In addition to the requirements above, you are required to have a device with a functional camera and microphone. In an emergency, you can connect through a telephone call, but video connection is the expected norm.
Grading
Your grade in this course will be based on in-class participation (including weekly oral quizzes about readings), in-class activities and related homework assignments of various types, a case study, and a paper discussion.
Written and oral communication are an integral part of any statistical work, and as such, grammar, style, and spelling are part of grading rubrics applied to all deliverables. You are strongly encouraged to use the resources and tutoring offered by the writing center (https://writingcenter.gmu.edu).
All assignments in this course are designated as individual assignments, which are to be undertaken independently. You may discuss your ideas with others but everything you turn in must be your own work. You may not share analyses, graphs, code, and other materials. You are responsible for making sure that there is no reason to doubt that the work you hand in is your own.
Attendance
Attendance is mandatory and in-class participation, two oral presentation as well as presentation notes of two other oral presentation are part of your final grade. You will be able to choose the two dates of your oral presentation at the beginning of the term, but the date of note taking and peer reviewing will be assigned by the instructor after the presentation dates are chosen.
In case of approved absence, please get notes from your peers. You are responsible for material covered in class and announcements made during class.
Participation
Success in this course requires active participation in in-class activities and discussions, for which you will need to prepare in advance for each class period. Accordingly, you are expected to prepare for class period by
- reading the corresponding sections of textbooks or research articles to be covered in class,
- reviewing class materials posted in Blackboard to be covered in class,
- familiarizing yourself with the use of the covered methods and techniques in a statistical programming system of your choice.
Mandatory readings for the following class will be posted by Friday 11:59 PM. An oral quiz will be done at the beginning of each lecture assessing your understanding of the material covered in the readings.
In-class activities
There will be 11 in-class activities through the term which will vary in length and content. The worst of these activities will not count towards your final grade. In-class activities are due the day after the lecture at 11:59 PM and must be submitted on Canvas.
Late submissions will be penalized by reducing the total number of points possible by 10% of the original total number of points for each day late. For example, if an in-class activity is worth a total of 10 points, it will be worth only 9 points when submitted within the first 24 hours after the due date, 8 points when submitted between 24–48 hours after the due date, and 7 points when submitted more than 48 hours late. Submissions will not be accepted more than 72 hours (3 days) past the due date.
Paper discussion
You will be assigned one peer-reviewed research paper on which you need to prepare a written summary and an oral presentation. You will be required to provide an in-depth discussion of the paper as well as an application of its methods/ideas. More details will be provided in class and on the Canvas site.
Peer-review
You will be assigned to review a peer’s paper summary, including detailed feedback and suggestions.
Case study project
The case study project will involve designing, implementing, and summarizing a computationally intense statistical analysis. The analysis can be chosen by you, but must be approved by the instructor no later than April 9. More details will be provided in class. You must submit a written report on Canvas and the fully functional, well-structured and efficient code via GitHub. Specific deliverables depend on the chosen project and will be discussed individually. You must ensure that the analysis and all results are fully reproducible.
You are expected to address your project report with the same level of preparation and presentation that you would associate with a finished product on your job as statistician. The report that you write for this course will be graded on both your in-depth analysis and your writing. A report must be no more than 5 pages (excluding tables, figures, and references). Reports must not include any raw outputs from statistical software and no computer code.
Grading Policies
Percentage grade | Letter grade |
---|---|
≥ 90% | A |
≥ 80% but < 90% | B |
≥ 70% but < 80% | C |
< 70% | F |
Regrading policies
You have at most one week after a score is posted for an assignment to appeal the score. If you want parts of an assignment remarked, send an email to the instructor specifying the question/part and the reason for requesting a review of grading. If you do not notify the instructor in writing of any issues with your score within that time, then the posted score stands (whether or not it is correct).
Further policies
Please see the George Mason Common Course Policies for additional policies governing this course.