Adds Getting Started section

2025-03-04 09:11:37 -05:00 · 2023-08-31 11:30:27 -04:00 · 2023-08-31 11:30:27 -04:00 · e091473db3
commit e091473db3
parent 4be795081a
6 changed files with 273 additions and 2 deletions
--- a/IEEE_754_Double_Floating_Point_Format.png
+++ b/IEEE_754_Double_Floating_Point_Format.png
--- a/_toc.yml
+++ b/_toc.yml
@ -8,3 +8,8 @@ parts:
      chapters:
        - file: outline
          title: Course outline
+    - caption: Getting started
+      chapters:
+        - file: basics
+        - file: programming
+        - file: floats
--- a/basics.md
+++ b/basics.md
@ -0,0 +1,102 @@
+# Getting set up
+
+## To do
+
+Here is a summary of what you need to do to get set up for the course:
+
+- Join the **Slack** workspace for the class. You should have received email from Simon via myCourses (check your spam folder!). 
+
+- Fill out the **when2meet poll** that Simon sent in the same email so we can find a time for the **Debug den**.
+
+- Mkae a **Github account** if you don't already have one, and make a private repository where you will submit your homework (as a Jupyter notebook for each homework).  If you are making a new account, choose your username carefully, your Github account is a great place to build your computing portfolio, e.g. for future job searches!
+
+- Make sure you have **Python 3** installed and the **NumPy**, **SciPy**, and **Matplotlib** libraries.
+
+- We will be doing interactive exercises in class, so you should **bring your laptop to every class**. The classroom likely has limited power outlets available, so please charge your battery before class if needed!
+
+**If you need help with installation, setting up a Github repository, or if you have questions about NumPy and Matplotlib basics, let the TAs know, they can help you in the Debug den sessions.**
+
+
+## Python
+
+In this course, we will use [Python 3](https://docs.python.org/3/). This will build on the exposure to Python you have already had in earlier courses. Python has the advantages of a high level language, for example, concise code, rapid prototyping, no compilation step, interactive use, while also having access to efficient numerical libraries that are implemented in C or Fortran under the hood. You may be interested in exploring the use of other languages for numerical computations, but for this course all submitted material must use Python 3.
+
+We will assume that you have access to 
+
+- [NumPy](https://numpy.org/doc/stable/)
+- [SciPy](https://docs.scipy.org/doc/scipy/tutorial/index.html#user-guide)
+- [Matplotlib](https://matplotlib.org/stable/#)
+- [Jupyter notebook](https://jupyter.org)
+
+If you are not already set up to run Python, a good way to install it is through [Anaconda](https://www.anaconda.com/download) which will also install all the libraries you need. You may have Python installed but be missing SciPy for example, in which case you can try `pip install scipy`.
+
+How you interact with Python is up to you. A really great way to code is in a Jupyter notebook. Homeworks should be submitted as Jupyter notebooks in your Github repository.  
+
+It might be helpful to have a look at some NumPy tutorials to remind you how to create and manipulate arrays etc., and to remind yourself how to make a basic plot in Matplotlib if needed. The above are to the documentation for each library, which have basic introductory tutorials.  Otherwise, there are many tutorials you can find online. One place to look is at [Mike Zingale's class AST 390: Computational Astrophysics at Stonybrook](https://zingale.github.io/computational_astrophysics/intro.html)
+which has a section covering NumPy and Matplotlib. 
+
+```{admonition} Exploring NumPy
+Just to get you started with NumPy, here are some expressions you can use to begin with:
+
+Creating arrays
+
+- `a = np.array([1.0,2.0])`
+- `np.ones(10)`
+- `np.zeros(10)`
+- `np.ones_like(a)`
+- `np.arange(2.0,10.0,100)`
+- `np.linspace(2.0,10.0,num=100)`
+
+Slicing arrays
+
+- `a[1:10]`
+- `a[:-1]`
+- `a[::2]`
+- `a[::-1]`
+
+To see what operations are available for an object
+
+`dir(a)`
+
+It is helpful to be aware of when a variable is a reference (pointer) to an array and when an array is copied or not:
+
+- `B=A`   points to same object (`A is B` will return True)
+- `B=A[:]`   shallow copy/view (same memory)
+- `B=A.copy()`    deep copy   (`A is B` will return False)
+```
+
+
+## Version control and Github
+
+In this course, we will make use of `git` which is a *version control* system (in particular, you will use this to submit your homework and will work in a collaborative group for your project).  Version control systems let you
+
+- keep track of changes that you make and revert back to previous versions
+
+- merge contributions from others (and deal with conflicts) and make contributions yourself to other projects
+
+- see who contributed what code and when it was added
+
+- work on updates to the code in separate "branches"
+
+- if your central repository is based on Github, it provides a backup
+
+You should become familiar with the basic git operations
+
+- `git init` or `git clone`
+
+- `git status` and `git log`
+
+- `git add`
+
+- `git commit -m` or `git commit -am`
+
+- `git push`
+
+- the `.gitignore` file
+
+
+A place to get started is the *Hello World* tutorial at Github: 
+
+https://docs.github.com/en/get-started/quickstart/hello-world
+
+Also, again you will find an section on Github in [Mike Zingale's class AST 390: Computational Astrophysics at Stonybrook](https://zingale.github.io/computational_astrophysics/intro.html).
--- a/floats.md
+++ b/floats.md
@ -0,0 +1,105 @@
+# Floating point arithmetic
+
+## How floating point numbers are represented
+
+The format of floating point numbers is set out in the [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) standard. Double-precision floating point numbers are represented using 64 bits (8 bytes). These are split into 1 bit for the sign of the number (0 for positive, 1 for negative), 11 bits for the exponent, and the remaining 52 bits are for the mantissa (also known as the significand).
+
+![Illustration of bit formatting for a double precision float](IEEE_754_Double_Floating_Point_Format.png)
+
+([Image credit: Wikipedia](https://en.wikipedia.org/wiki/Double-precision_floating-point_format))
+
+The corresponding floating point number is 
+
+$$(-1)^s \times (1 + \sum_{i=1}^{52} b_{52-i} 2^{-i}) \times 2^{e-1023}$$
+
+where $s$ is the value of the sign bit (0 or 1), $e$ is the value of the exponent, and $b_i$ corresponds to bit $i$ in the fraction, as labelled in the diagram.
+
+The values of the exponents range from 
+- $e=00000000001 (\mathrm{binary})=1 (\mathrm{base 10})$, corresponding to $2^{e-1023}=2^{1-1023}=2^{-1022}$
+
+to
+
+- $e=11111111110=2046$, corresponding to $2^{e-1023}=2^{2046-1023}=2^{1023}$
+
+(The values $e=0$ and $e=2047$ have special meaning -- see the [Wikipedia page on Double-precision floating-point format](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) for more information.)
+
+
+```{admonition} Questions
+- What is the order of magnitude of the largest and smallest values that can be stored in a double?
+- What is the binary representation of the number 3.0?
+```
+
+
+We focused here on double precision numbers, which are standard in 64 bit machines. Single precision floats are 32 bits, with 8 bits for exponent and 23 for mantissa.
+
+**Further reading:** [What every computer scientist should know about floating-point arithmetic](https://dl.acm.org/doi/10.1145/103162.103163) by David Goldberg.
+
+ 
+## Roundoff error
+
+Roundoff error occurs because of the finite precision of floating point variables which means that many values cannot be represented exactly. You might think of irrational numbers (e.g. $\pi$) or recurring decimal such as $1/11=0.0909\dots$, but even $1/10=0.1$ doesn't have an exact representation in floating point where the fraction is written in base 2.
+
+```{admonition} Exercise
+Here are some floating point expressions to evaluate to illustrate roundoff:
+- `(2**0.5)**2 - 2`
+- `1.1 + 2.2 - 3.3`
+- `0.1 == 0.10000001`
+- `0.1 == 0.10000000000000001`
+- `(0.7 + 0.1) + 0.3`
+- `0.7 + (0.1 + 0.3)`
+```
+
+The size of the roundoff error is set by the number of bits that we have available for the fraction. For double precision, this is of order $2^{-52}\approx 2.2\times 10^{-16}$. This is a small number, which is good! But you do have to be careful under certain situations:
+
+- **Comparing floats.** Rather than comparing floats, it's better to instead test whether they are close to each other. E.g., with `x = 1.1 + 2.2`, `x==3.3` returns `False`, whereas `abs(x-3.3) < 1e-8`  returns `True`.
+
+- **Subtracting numbers that are almost equal.** This comes up in many problems where you are evaluating a physical quantity which is given by subtracting two terms that almost cancel. If the difference between the two numbers falls becomes comparable to the floating point precision, roundoff error can dominate the answer. In these cases, you can often rewrite the expressions to be evaluated to avoid the subtraction, an example is given below.
+
+- **When doing many operations and errors accumulate.** Small errors accumulate over many operations. An example is long term integrations of planetary orbits, for example when trying to compute the future evolution of the Solar System. Finite precision in chaotic systems can have a big impact on the solution.
+
+```{admonition} Exercise
+
+Consider the two forms of the function
+
+$$f(x) = {1\over\sqrt{1+x^2}-x} = \sqrt{1+x^2}+x$$
+
+One of these involves a subtraction, and one doesn't. Try evaluating and plotting these two expressions as a function of $x$. Do you see the effects of roundoff error at large values of $x$?
+
+```
+
+**Further reading**
+
+- Gezerlis, Chapter 2
+- [Floating Point Arithmetic: Issues and Limitations](https://docs.python.org/3/tutorial/floatingpoint.html) from the Python documentation
+- [Examples of floating point problems](https://jvns.ca/blog/2023/01/13/examples-of-floating-point-problems/)
+- [The Perils of Floating Point](http://www.indowsway.com/floatingpoint.htm)
+
+
+## NumPy data types
+
+NumPy has many different data types, which you can read about here:
+https://numpy.org/doc/stable/user/basics.types.html
+
+The default type for a floating point is a 64 bit float as we discussed above.
+
+```{admonition} Exercises
+
+Here are some things to try to investigate NumPy data types:
+
+- `np.finfo(np.float32).eps`
+- `np.finfo(np.float64).eps`
+- `a = np.ones(10); a.dtype`
+- `np.double(2.2) + np.double(1.1) - np.double(3.3)`
+- `np.single(2.2) + np.single(1.1) - np.single(3.3)`
+- `np.int_(2) ** 10;   2**10`
+- `np.int_(2) ** 100;   2**100`
+```
+
+Note that numpy integers do not behave like python integers! They have a fixed size in bytes (and therefore maximum and minimum values), whereas Python integers are objects that adapt their size to the precision needed.
+
+
+
+
+
+
+
--- a/outline.md
+++ b/outline.md
@ -80,13 +80,22 @@ There are also other courses on computational physics that have materials online

 This will be an interactive class, with a mixture of lecture and problem solving during class time. *There will be coding exercises in class -- you should bring your laptop to class with you.*

+There will also be a weekly "*Debug den*", which will be an informal hackspace where you can work together and get help from the TAs. If there are topics you would like covered in a tutorial in these sessions, you can let the TAs know. 
+
 Your grade will be based on 

- Homeworks (30%), given out every 1-2 weeks during the term. The lowest homework score will be dropped when calculating the final grade. 
+- Homeworks (30%), given out every 1-2 weeks during the term. The lowest homework score will be dropped when calculating the final grade. More details on the grading scheme for homework will be given next week. Note that you can discuss homework questions with other students in the course, but you should write your own homework solutions and any code you hand in should be your own.
+
 - Project (30%). In teams of 2-3 students, the project component will involve developing a code to investigate a physics problem of interest to the student. The topic must be decided on and approved by the instructor by mid-October, and the project will be due at the end of term. More information and suggestions for topics will be provided in the first week of classes.
+
 - Take-home final exam (40%). A mixture of analytic and computational problems. The take-home exam will be available for a 72 hour period and designed to be completed in 3 hours.

-Lecture notes and assignments will be made available through this website. Homework and project submissions will be through each students private Github repository. Grades will be distributed in myCourses. There will be a Slack workspace for the class for students to collaborate on projects.
+We will use the following distribution and communication tools:
+- Lecture notes and assignments will be made available through this website
+- Homework and project submissions will be through each students private Github repository
+- Grades will be distributed in myCourses
+- A Slack workspace is available for the class for students to collaborate on projects and to ask questions or discuss the homework. Note that discussions and questions about homework **must** be posted to the `#homework` channel (we will make a tool available to post questions anonymously if you prefer to do that). *Note that the TAs will not answer questions about homework via direct message in Slack, you must use the homework channel.*
+

 ## McGill policy statements

--- a/programming.md
+++ b/programming.md
@ -0,0 +1,50 @@
+# Programming best practices
+
+No matter what your previous programming experience, hopefully you will learn a lot by doing the exercises and homeworks in this course. We'll have a discussion about some best practices for writing code in the first class.
+
+Here are a few things to keep in mind:
+
+- **Don't write code unless you have to**. Is there an existing library or open source code that does what you need to do? if so, you don't need to reinvent the wheel. For this course, we will use many libraries from NumPy and SciPy. You can also search on Github to find open source codes.
+
+- **Readability is really important**. Even if you think that no one else will ever have to read your code, you will thank yourself when you come back to the code after a year and can't remember why you wrote it the way you did! Readability can mean including helpful comments (using `#` in Python), but is also helped by the code layout, variable naming etc. *Very important for this course*: you need to write readable code for the homeworks so that the TA's can understand it and grade it!
+
+- **Write once**, also known as DRY (don't repeat yourself). It's often tempting to cut and paste code when you have to do something multiple times. An simple example is when you are making multiple plots in a matplotlib script. Instead, put the code into a function or inside a loop. Having one copy of the code in one place will save endless debugging headaches.
+
+- **If you get stuck, someone else almost certainly had the same problem.** Google, Stack Overflow, or AI code generators such as ChatGPT or Github Copilot will often quickly help you find the answer (but may also give you wrong answers, especially the AI models!). *Note that any code that you hand in for the course that is not your own needs to come with a citation. If you use an existing code snippet from Stack Overflow or Chat GPT you need to cite your sources.*
+
+- **Use vector operations wherever possible**. Python is a high level language and that can introduce a lot of overhead behind the scenes -- for example, Python has to figure out the type of a variable because the type of variable (integer, float etc) is not explicitly given in Python programming. That takes time, especially in a loop. Try the following exercise to see what I mean.
+
+```{admonition} Questions for discussion
+- What are the advantages and disadvantages of using existing code over writing it yourself?
+- What makes for a good comment? 
+- What are best practises for code layout or variable naming?
+- What other best practices have I missed?
+```
+
+
+````{admonition} Exercise
+Create an array of $N=10^6$ angles $\theta$ equally distributed between 0 and $2\pi$. Now calculate the corresponding vector $\sin\theta$ using (1) a for loop in which you loop over the theta values and calculate $\sin\theta$ for each one, and (2) the vector operation `np.sin(theta)`. How long does it take in each case?
+
+Paste your answer in this sheet:
+https://docs.google.com/spreadsheets/d/1nDgjUhGySHeA5bnI_73m4KB_fJnwJF5tyykO4cy8P3c/edit?usp=sharing
+
+
+**Hint:** To time your code, you can use 
+```
+import time
+
+t0 = time.time()
+
+# your code here
+
+t1 = time.time()
+print('That took', t1-t0, ' seconds')
+```
+(see also `timeit` for more sophisticated code timing).
+
+````
+
+**Further reading** There are many books on best programming practices and styles. One of the classics that was recently updated is
+
+- [The pragmatic programmer: your journey to mastery](https://mcgill.on.worldcat.org/search/detail/1112609085?queryString=ti%3A%28pragmatic%20programmer%29&databaseList=283%2C638&origPageViewName=pages%2Fadvanced-search-page&clusterResults=&groupVariantRecords=&expandSearch=true&translateSearch=false&queryTranslationLanguage=&lang=en&scope=wz%3A12129) by Thomas and Hunt (2020 second edition) (the link is to the McGill library ebook).
+