Notes on Python for Finance

Resources

The Quant Platform website: http://py4fi.pqp.io

Our company website http://tpq.io
My private website http://hilpisch.com
Our Python books website http://books.tpq.io
Our online training website http://training.tpq.io
The Certificate Program website http://certificate.tpq.io
Training program: http://pyalgo.tpq.io

Conventions Used in This Book

Italic: for terms, URLs, email addresses. Monospace: for deliberately for technos. Monospace and italic: for user-defined values.

![[Pasted image 20250303205954.png]]

Prep for coding

Creating an ad-hoc conda environment for this project.

Supplemental material (in particular, Jupyter Notebooks and Python scripts/modules) is available for usage and download at http://py4fi.pqp.io.

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!

Importing means making a package available to the current name‐ space and the current Python interpreter process.

Must familiarize with Numpy ndarray and pandas DataFrame data structure.

Finance firms are spending trillions on tech.
Tech can go wild.(2010 Flash Crash)

Chap 1 Why Python for Finance

Context w/ modern finance:

Firms have to power tech to be competitive, millions and millions;
More and more tech savvy(Data processing, analytic speed, theoretical foundations.)
Real-time analytics

About Virtual Environment

Course code repo: https://github.com/yhilpisch/py4fi2nd

Solutions:

Create an ad-hoc venv or conda environment for the project, on Linux distro. Regarding virtual environment, container, Anaconda: https://g.co/gemini/share/b853fe5cdef1
Use a MacBook, then follow: https://github.com/yhilpisch/py4fi2nd

My MacBook use guide link here

Part 1

Chap 1 Intro

Key takeaways:

Naming conventions in Python resembles that of the real world. This characteristic makes Python the efficient choice.
High abstraction and rigid implementation.
Firms might implement Python from prototyping to production.
Important theories like MPT and CAPM lack of data-driven support, most by experience.
Outdated theories rely on weak assumptions.

General tips to improve efficiency with Python:

Use a simpler approach(less loops, less vectorization)
Use specialized packages to handle data
Use parallelization.

A proper treatment of AI-first finance, however, would require a book fully dedicated to the topic. This book only provied an entry level of understanding of how to apply AI to finance.

Conclusion

Python, with its elegant syntax, efficient development approaches, and versatility for both prototyping and production, stands as an ideal technological framework for the financial industry. Its extensive ecosystem of packages, libraries, and tools addresses the challenges posed by recent developments in finance, including analytics, data management, compliance, and technology. Python streamlines end-to-end development and production, and its dominance in AI, machine learning, and deep learning makes it the go-to language for data-driven and AI-first finance, which are reshaping the financial industry.

Chap 2 Deployment

This chapter covers the techniques with python deployment:

package managers
Virtual environment managers
containers
cloud instances

Getting familiar with `conda`

#syntax #code conda can server as a package manager as well as an environment manager.

Getting familiar with `docker`

A Docker container is an isolated filesystem containing an OS (e.g., Ubuntu), Python runtime, tools, and libraries. It runs uniformly across platforms (e.g., Windows 10 or cloud Linux).

Build an Ubuntu Python `docker` image

Prep:

apt-get updates
Install conda, python and other OS necessary packages using install.sh
Install docker, docker-compose

Notes on `colima`, `docker`, `shell`, `conda`, and `python`

Scope:

For py4fi example illustration purposes only.

KIM:

conda can manage packages and environments.
docker can instantiate an environment in a container.
You can build a pre-defined image using docker build.

`conda` basics

#syntax

Install packages: conda install <name0> <name1> ... -y
Search packages: conda search <name>
Update packages: conda update <name>
Remove packages: conda remove <name>
List packages: conda list
Create a virtual env: conda create -n $ENVIRONMENT_NAME
Activate a virtual env: conda activate <env_name>
Deactivate a virtual env: conda deactivate
Remove a virtual env: conda env remove -n <env_name>
List virtual envs: conda env list
Export environment configs: conda env export > $FILE_NAME
Create a virtual env from a config file: conda env create -f $FILE_NAME

`shell` basics

#syntax #code

Check conda: conda --version
Check docker: docker --version
Check python: python --version

Package management for debian like OS:

Update packages: apt-get update; apt-get upgrade -y
Install packages: apt-get install -y $PACKAGE_NAME

`colima` basics(Mac specific)

#syntax #code Goal: To have a minimalist working docker daemon on MacOS.

Using colima on MacOS to render a minimalist, vanilla docker daemon.

Installation: brew install colima
Create a docker daemon using colima: colima start -e #to edit demanding configs like disk usage, memory assignment, cpu assigment, etc.
After colima started, docker daemon is up and running, onward to docker manipulations.
Stop daemon: colima stop
Remove a daemon: colima delete <name>

Noteworthy configs:

Disk usage
Memory assignment
CPU assignment
Arch
Mount volume type
Mount point
Virtualization framework

`docker` basics

#syntax

List images: docker images
List all containers: docker ps -a
List running containers: docker ps
Run a container based on an image: docker run -it <image_name>:<release_tag>
Start a container: docker start <name_or_id>
Stop a container: docker stop <name_or_id>
Attach the session to a RUNNING docker container: docker attach <name_or_id>
Remove a container: `docker rm <name_or_id>``

Into the docker shell #syntax Goal: To install all necessary and sufficient packages for a python deployment(mainly conda and other python packages.)

List app running containers: docker ps
List all containers: docker ps -a
ssh into a simple container created with ubuntu:latest image: docker run -ti -h py4fi -p 11111:11111 ubuntu:latest /bin/bash
- After sshed into the docker container, the rest is just unix.
- Fetch miniconda.sh according to the arch, link here.
- Initialize conda using fetched miniconda.sh installation.
- Install some example packages like numpy, scipy, ipython, pandas, etc.

Building your own image

In an ad-hoc docker build directory, create two files, like shown in the book:

juan@juans-MacBook-Air docker-build % tree 
.
├── Dockerfile
└── install.sh
1 directory, 2 files

Note:

You are just creating an image, do not run the scripts in the host.
But images are created in the host.
Just cd into the directory and run docker build ....

Basic manipulations:

Build an image: docker build -t py4fi:basic .
Remove an image: docker rmi <name_or_id_of_the_image>

Going Cloud

Providers to consider:

DigitalOcean
AWS

Goal:

Apply the above to a cloud infrastructure.
Setup an online jupyter notebook.
Learn basic SSL(Namely OpenSSL)
Developing the Python deployment through any browser.(Maybe even phone?)

Prep:

User book provided scripts to initialize the deployment(Digital Ocean Droplet)
Get a VPS instance
Refer to the official Jupyter Notebook Docs to deploy(This pretty much sumps up everything need to know, and very, very, very easy and quick to deploy.)
if using the codes from the book, just follow it, basic steps as follows:
- Creating RSA keys: openssl req -x509 -nodes -days 365 -newkey rsa:1024 -out cert.pem -keyout cert.key
- Follow onscreen instructions
- Generate a hash protected password using Python built-in package: passwd('replace_with_an_actual_password')
- nano the jupyter notebook config file.
- nano the installation script.
- DigitalOcean Droplet orchestrate set up script.

How to use the book code:

cd into ../ch02/cloud
Gen a cert.key and cert.pem using the book provided openssl code.
Use a the old_notebook_env to get a hashed password like the example in the book.

KIM:

Use the security measure provided by Jupyter Notebook(A.K.A. JupyterLab, JupyterHub).
Follow the official guide to deploy JupyterHub, then onward to the book’s installation script. #syntax #code

[!code]- Click to show installation script

#!/bin/bash
#
# Script to Install
# Linux System Tools,
# Basic Python Packages and
# Jupyter Notebook Server
#
# Python for Finance, 2nd ed.
# (c) Dr. Yves J. Hilpisch
#
# GENERAL LINUX
apt-get update  # updates the package index cache
apt-get upgrade -y  # updates packages
apt-get install -y bzip2 gcc git htop screen vim wget  # installs system tools
apt-get upgrade -y bash  # upgrades bash if necessary
apt-get clean  # cleans up the package index cache

# INSTALLING MINICONDA
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O \
  Miniconda.sh
bash Miniconda.sh -b  # installs Miniconda
rm Miniconda.sh  # removes the installer
# prepends the new path for current session
export PATH="/root/miniconda3/bin:$PATH"
# prepends the new path in the shell configuration
echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
echo "conda activate" >> ~/.bashrc

# INSTALLING PYTHON LIBRARIES
# More packages can/must be added
# depending on the use case.
conda update -y conda # updates conda if required
conda create -y -n py4fi python=3.7  # creates an environment
source activate py4fi  # activates the new environment
conda install -y jupyter  # interactive data analytics in the browser
conda install -y pytables  # wrapper for HDF5 binary storage
conda install -y pandas  #  data analysis package
conda install -y matplotlib  # standard plotting library
conda install -y scikit-learn  # machine learning library
conda install -y openpyxl  # library for Excel interaction
conda install -y pyyaml  # library to manage YAML files

pip install --upgrade pip  # upgrades the package manager
pip install cufflinks  # combining plotly with pandas

Conclusion

Key Python deployment solutions for finance include:

Conda Environments
- Create project-specific environments (py4fi2nd.yml) for dependency isolation and reproducibility.
Docker Containers
- Containerize environments to ensure consistency across development/production stages.s
Cloud Infrastructure
- Leverage platforms like DigitalOcean for scalable, real-time analytics and code execution.

Part 2 On to Python

• Chapter 3 focuses on Python data types and structures.

• Chapter 4 is about NumPy and its ndarray class.

• Chapter 5 is about pandas and its DataFrame class.

• Chapter 6 discusses object-oriented programming (OOP) with Python.

Chap 3 Data Types and Data Structures

“Types” and “Structures” are not the same.

“TYPES” from onward will be referring to “Data Types” in this chapter.

“STRUCTURES” from onward will be referring to “Data Structures” in this chapter.

Basic Types: ![[Pasted image 20250401221249.png]]

Basic structures: ![[Pasted image 20250401221434.png]]

Good practices:

Use type() to check types.
Use ipython or other intellisense auto-completion to check for functions, classes, methods and the like.
Use dir() to check the complete list of attributes and methods of any object.

Floats

Fun but actually important fact: Floating numbers are essentially represented as binary formats in memory in Python( Python dynamically interprets data types at runtime and assign bits to them in memory), and when the floating number is less than one and bigger than zero, it will have an abundance of, if not infinite binary bits to represent them. IF this floating number is de facto assigned with a fixed bits, but it can NOT be represented in BINARY finitely, inaccuracies will occur. So, you might see this in ipython: ![[Pasted image 20250401222431.png]] The issue can be of importance when summing over a large set of numbers. In such a situation, a certain kind and/or magnitude of representation error might, in aggregate, lead to significant deviations from a benchmark value.

How to address this issue

Use the decimal package to specifically handle floats if accuracy is a priority.

Basic example: #syntax #code s

[!code]- python Click to show basic example with decimal import decimal from decimal import Decimal decimal.getcontext() Decimal(1)/Decimal(11) decimal.getcontext.prec() = 4 Decimal(1)/Decimal(11) decimal.getcontext.prec() = 50 Decimal(1)/Decimal(11)

Bool

Comparison operators: <,>,≤,≥,≠,= : yields bool
Logic Operators: and,or,not : yields bool
Any non-0 are yields True.

>>> bool(0.0)
False
>>> bool(4214.2)
True
>>> bool(-432)
True

Good Practices

Play with it.

Strings

KIM:

Any type in Python is an object, meaning: any object has its classes and methods to call upon.

E.g.: #code

>>> a_string = f"This is a string."
>>> a_string.strip(" his")
'This is a string.'
>>> a_string.strip(" T")
'his is a string.'
>>> a_string.strip(" t")
'This is a string.'
>>> a_string.replace(" ", "-")
'This-is-a-string.'
>>> a_string.
a_string.capitalize()    a_string.index(          a_string.isspace()       a_string.removesuffix(   a_string.startswith(     
a_string.casefold()      a_string.isalnum()       a_string.istitle()       a_string.replace(        a_string.strip(          
a_string.center(         a_string.isalpha()       a_string.isupper()       a_string.rfind(          a_string.swapcase()      
a_string.count(          a_string.isascii()       a_string.join(           a_string.rindex(         a_string.title()         
a_string.encode(         a_string.isdecimal()     a_string.ljust(          a_string.rjust(          a_string.translate(      
a_string.endswith(       a_string.isdigit()       a_string.lower()         a_string.rpartition(     a_string.upper()         
a_string.expandtabs(     a_string.isidentifier()  a_string.lstrip(         a_string.rsplit(         a_string.zfill(          
a_string.find(           a_string.islower()       a_string.maketrans(      a_string.rstrip(                                  
a_string.format(         a_string.isnumeric()     a_string.partition(      a_string.split(                                   
a_string.format_map(     a_string.isprintable()   a_string.removeprefix(   a_string.splitlines(

A brief list of methods for str() object:

`print()`

You can apply format strings to print().

`regex` (Regular Expression)

Good Practices

Play with it.

Structures

list : More flexible(Most of the time, working ONLY with list is sufficient.)
tuple : More rigid(Immutable)
dict
set

KIM:

Any structure in Python has a built-in index.
The index uses 0-based indexing.
Usually, = means assign values, == means comparing values. #code

In [102]: l = [1, 2.5, 'data']
l[2]
Out[102]: 'data'
In [103]: l = list(t)
l
Out[103]: [1, 2.5, 'data']
In [104]: type(l)
Out[104]: list
In [105]: l.append([4, 3])
l
Out[105]: [1, 2.5, 'data', [4, 3]]
In [106]: l.extend([1.0, 1.5, 2.0])
l
Out[106]: [1, 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]
In [107]: l.insert(1, 'insert')
l
Out[107]: [1, 'insert', 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]
In [108]: l.remove('data')
l
Out[108]: [1, 'insert', 2.5, [4, 3], 1.0, 1.5, 2.0]
In [109]: p = l.pop(3)
print(l, p)
[1, 'insert', 2.5, 1.0, 1.5, 2.0] [4, 3]

Control Structures

help(range)

Typically: the for loop.
- for typically used with list objects.
Counter based loops(like i = 0, while i < stuff in other languages) are implements in Python typically using range object. You could also achieve the same thing with while.
- help on range:

class **range**(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object

`List` comprehension in Python

E.g.:

In [117]: m = [i ** 2 for i in range(5)]
m
Out[117]: [0, 1, 4, 9, 16]

![[Pasted image 20250408190130.png]]

Good Practices

Play with it.
Keep loops as minimal as possible(use embedded functions, methods, lamda, map(), etc.)

Functional programming

Function Definition

def f(x[, argument1[, argument2...]...]):
	return x

Tools for functions

#code Help on map:

Help on class map in module builtins:
class **map**(object)
 |  map(function, iterable, /, *iterables)
 |
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.

Pay attention to the arguments of map. E.g.:

In [120]: list(map(even, range(10)))
Out[120]: [True, False, True, False, True, False, True, False, True, False]

Anonymous function: `lambda`

Example with lambda:

lambda input: output_of_input

`filter`

E.g filter an iterator with even elements:

In [**13**]: list(filter(**lambda** x: x % 2 == 0, range(10)))
Out[**13**]: [0, 2, 4, 6, 8]

Good Practices

Play with it.
Keep loops as minimal as possible, even though it’s only implicit(use embedded functions, methods, lamda, map(), etc.)

`dict`s

dict objects
mutable, like list
concept of key-value pair
unordered(generally)
not sortable(generally)
defined using {}
has built-in methods, like any other object.

Methods of dict: #code ![[Pasted image 20250409130809.png]]

`set`s

not too many applications(not typical)
unordered
collections of other objects
trimmed elements(every element is unique)
can be applied with math set theory
one applications is to get rid of duplicates in a list object

Conclusion

Basic data types: int, float, bool, and str serve as atomic types.
Standard data structures: tuple, list, dict, and set are widely applicable, with list being particularly flexible for diverse financial use cases.

Chap4: NumPy

#numpy numpy expands data structures to arrays.

Arrays

But first, key downsides of using list:

high memory usage
slow performance

For real applications, arrays prevails.

In the more common case, an array represents an i × j matrix of elements.

numpy is to specialize in arrays.

Get started with array

Python has a built-in array package that handles array with very truncated functionalities.

import array

A simple list object is considered a 1d array:

In [1]: v = [0.5, 0.75, 1.0, 1.5, 2.0]

A nested list objects(n-dimensional array…):

In [2]: m = [v, v, v]
m
Out[2]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]

Elements are interlinked by default, meaning: if you create a list object with another list object, the created list will mutate if the other list object mutated. To prevent this, use deepcopy module from copy package.
arrays has some basic built-in file operation functionalities(like store into a file, etc)
arrays can be converted to list if in need.

`numpy` `array`s

Basics:

#code #numpy

In [28]: import numpy as np
In [29]: a = np.array([0, 0.5, 1.0, 1.5, 2.0])
a
Out[29]: array([0. , 0.5, 1. , 1.5, 2. ])
In [30]: type(a)
Out[30]: numpy.ndarray
In [31]: a = np.array(['a', 'b', 'c'])
a
Out[31]: array(['a', 'b', 'c'], dtype='<U1')
In [32]: a = np.arange(2, 20, 2)
a
Out[32]: array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
In [33]: a = np.arange(8, dtype=np.float)
a
Out[33]: array([0., 1., 2., 3., 4., 5., 6., 7.])
In [34]: a[5:]
Out[34]: array([5., 6., 7.])
In [35]: a[:2]
Out[35]: array([0., 1.])

numpy.ndarray has a lot of built-in methods to provide insights on statistics, computation, manipulation, etc.
Operations executed upon ndarrays usually are vectorized(Refer to: On Vectorization), which is more intuitive than vanilla list objects.
with floats computation, math module beats numpy.
numpy methods are universal, meaning they can be applied to the basic Python data types.

What does it mean by `universal`:

In [**1**]: **import** **numpy** **as** **np**
In [**16**]: b = [1,2,3]
In [**17**]: np.sqrt(b)
Out[**17**]: array([1.        , 1.41421356, 1.73205081])

non-vectorized vs vectorized

![[Pasted image 20250409143616.png]]

`math` vs `numpy`

![[Pasted image 20250409143125.png]]

`np.exp()` vs `**`

In general, np.exp() means natural exponential by default, ** n raise to the power of n. #code #numpy

In [**1**]: **import** **numpy** **as** **np**
In [**2**]: help(np.exp)
In [**3**]: a = [0, 25, 50, 75, 100]
In [**4**]: print(np.exp(a))
[1.00000000e+00 7.20048993e+10 5.18470553e+21 3.73324200e+32
 2.68811714e+43]
In [**5**]: print(np.exp(a, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(np.exp(a, 2))
TypeError: return arrays must be of ArrayType
In [**6**]: help(np.exp)
In [**7**]: a ** 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 a ** 2
TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
In [**8**]: a = np.array(a)
In [**9**]: a
Out[**9**]: array([  0,  25,  50,  75, 100])
In [**10**]: a ** 2
Out[**10**]: array([    0,   625,  2500,  5625, 10000])

Multiple dimensions

![[numpy_ndarray.png]] ![[SCR-20250410-mjlt.png]]numpy’s dtypes: ![[Pasted image 20250410194734.png]]

`np.linspace`

Creates a one-dimensional ndarray object with evenly spaced intervals between numbers; parameters used are start, end, and num (number of elements).

Key Takeaways(for `ndarray` objects):

Metainformation
- np.array.size
- ~.itemsize
- ~.ndim
- ~.shape
- ~.dtype
- ~.nbytes
Reshaping
- KIM:
  - Generally, reshaping only throws another view of the array;
  - resizing creates a new, temp object
  - You can dump the reshapes to a new/old object if that’s desired.
- Reshaping:
  - np.arange()
  - np.array.shape
  - np.shape(np.array)
  - np.array.reshape((row, col))
    - dump to a new variable: new_arr = np.array.reshape((row, col))
  - Transpose
    - new.arr.T # Trans rows to cols and cols to rows
    - `new_arr.transpose()

During a reshaping operation, the total number of elements in the ndarray object is unchanged. During a resizing operation, this number changes. it either decreases (“down-sizing”) or increases (“up-sizing”). Here some examples of resizing

Resizing: - np.resize(arr, (new_row, new_col))

The size of the connecting dimension must be the same for stacking operations

Stacking:
- horizontal stacking: np.hstack((<ndarrays>, <operations>))
- vertical stacking: np.vstack((<ndarray_objects>, <operations>))

Flattening is to reduce ND ndarray object to a 1D object. It can happen row-by-row or col-by-col.

Flattening:
- h.flatten(order='C|F') # Set C against rows, F against cols.
- The .flat and .ravel(order="C|F") iterator scans element by element with the specific order.

Comparison and logical operators work on ndarray objects element wise.
Boolean arrays can be used for indexing and data selection.

Boolean Arrays
- [arr [,<|,>|,<=|,>=|,==][,&] arr] returns a view to represent the values in bool.
- Important method: np.where(<a_logical_statement>, <value_to_assign_when_true>, <value_to_assign_when_false>). Values waited to be assigned can be TYPES.
- bool_ndarray.astype(int) # To represent True|False to 1|0.

![[Pasted image 20250410230739.png]]

Speed: numpy generally wins, if not always.

Structured: NumPy allow you to have a different dtype PER column. #code

import numpy as np
# 2 Types to defining the dtype
# 1. Define explicitly
dt = np.dtype([
    ('Name', 'S10'),         # 1. 'Name' field with string data type of maximum length 10
    ('Age', 'i4'),           # 2. 'Age' field with 4-byte (32-bit) integer data type
    ('Height', 'f'),         # 3. 'Height' field with default float data type (usually 32-bit)
    ('Children/Pets', 'i4', 2)  # 4. 'Children/Pets' field with a shape (2,) array of 4-byte integers
])
# 2. Define in a more readable way
dt = np.dtype({'names': ['Name', 'Age', 'Height', 'Children/Pets'],
		'formats':'O int float int,int'.split()}
		# Notice `formats` as the key of the dtype, values are the dtypes to be defined.
# 3. Define in a more readable way, explicitly. Since `split` essentially returns a list.
dt = np.dtype({'names': ['Name', 'Age', 'Height', 'Children/Pets'],'formats':['O', 'int', 'float', 'int,int']}) # Notice the composite 'int,int' dtype.

data = np.array([
    ('Alice', 30, 5.5, [2, 1]),
    ('Bob', 25, 6.0, [0, 2])
], dtype=dt)

Define dtype in a funnier way(Does not work if there is only 1 field…):

In [**27**]: dt = np.dtype({
    ...:     'names': ['Name','/Age/Children/Pets/Houses/Income/Height'],
    ...:     'formats': ['O','int,int,int,int,int,float']
    ...: })
In [**28**]: s = np.array([('Smith', (45, 0, 2, 1, 100, 1.83))], dtype=dt)
In [**29**]: s
Out[**29**]: 
array([('Smith', (45, 0, 2, 1, 100, 1.83))],
      dtype=[('Name', 'O'), ('/Age/Children/Pets/Houses/Income/Height', [('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<i8'), ('f5', '<f8')])])

Indexing w/ dtype:
- arr["Name_of_the_Col"].methods
- arr[<the_index_of_rows>]
- arr[<row>][<"Col">]

Key Takeaways for dtype:

It’s like a template to tell the to-be-assigned data, “We want this type of format.”
Like a SQL database.
Opens access for the data to be searched by names and indexes.

Vectorizations allows computation to be faster. Basic computations are treated as scalar manipulations(element-wise) by default. You can combine matrix with matrix(scalar computations) or matrix with numbers(linear transformation).

More on numpy’s broadcasting and it’s handling

Click to View

Okay, here's a concise conclusion summarizing the key points from our chat about matrices in linear algebra versus NumPy:

Operation Distinction: NumPy uses distinct operators: * for element-wise multiplication (Hadamard product) and @ or np.dot() for linear algebra’s matrix multiplication.
Matrix Multiplication (@, np.dot): Both linear algebra and NumPy require matching inner dimensions (e.g., (m x n) @ (n x p)). Order is crucial (AB ≠ BA). NumPy flexibly treats 1D arrays as appropriate row or column vectors to satisfy this rule.1
Element-wise Multiplication (*):
- In linear algebra (Hadamard product), this strictly requires matrices to have the exact same shape.2 Order doesn’t matter.
- In NumPy, * performs this element-wise operation. If shapes differ but are compatible, NumPy uses broadcasting to virtually align them without needing identical shapes.3 Order doesn’t matter.
Broadcasting: This is a NumPy mechanism specifically for element-wise operations (*, +, etc.) to handle arrays of compatible but different shapes efficiently, without making data copies. It does not apply to matrix multiplication (@).
Shape Flexibility: Linear algebra has rigid shape requirements (identical for element-wise, matching inner dimensions for matrix multiplication). NumPy is more flexible primarily due to broadcasting for element-wise operations and its adaptive handling of 1D arrays in matrix multiplication.

Review on linear algebra

Memory Layout: The memory layout of an ndarray object doesn’t affect the overall sum calculation. However, summing over rows and columns is faster with C-ordered arrays. Specifically, summing over rows is relatively faster than summing over columns in C-ordered arrays, while the opposite is true for F-ordered arrays.

NumPy is the preferred Python package for numerical computing, offering the efficient ndarray class and vectorized operations that minimize slow Python loops. These techniques are also applicable to pandas DataFrames.

Conclusion

NumPy is the preferred Python package for numerical computing, offering the efficient ndarray class and vectorized operations that minimize slow Python loops. These techniques are also applicable to pandas DataFrames.

Chap5 `pandas`

#code #pandas

![[Pasted image 20250411211554.png]]

Denote DataFrame as df for the following context.

df is the core of pandas.

Key Takeaways:

pandas is to manage indexed and labeled data.
Data can be in basic TYPES and ndarrays.
Data can be organized in cols with names.
Index can be in numbers, str, datetime.
Vectorization is almost, if not always, faster than loops.
Apply vectorization explicitly is almost, if not always, faster than apply vectorization implicitly.
pandas support some intuitive data manipulations, such as appending a col to the existing df(df['a_new_col']=(<new_col_value1>,<new_col_value2>))
It’s always a good practice to explicitly assign an index value when appending new data.
Good practice to instantiate a data frame object:
1. instantiate a np.arrray
2. instantiate dataframe from np.array
3. then assign column names like df.columns = ['col1', 'col2', ...]
For financial research purposes, time is of the essence.
In general the scalar are applied to arrays and dataframe objects element-wise. There are specific methods to apply vector manipulations builtin.
pandas provides a wrapper around matplotlib specifically for DataFrame.

In [**6**]: %time df.apply(**lambda** x: x ** 2)
CPU times: user 1.05 ms, sys: 35 μs, total: 1.08 ms
Wall time: 1.09 ms
Out[**6**]: 
   numbers
a      100
b      400
c      900
d     1600
In [**7**]: %time df ** 2
CPU times: user 350 μs, sys: 22 μs, total: 372 μs
Wall time: 379 μs
Out[**7**]: 
   numbers
a      100
b      400
c      900
d     1600

#syntax Core methods:

import pandas as pd
df = pd.DataFrame()
df.index
df.columns = ['col1', 'col2', ...]
df.loc
df.iloc
df.sum
df.apply
df /+/-/*// scalar
df['index']['index2']
df.concat
df.append
df.mean
df.std
df.values
df.date_range
np.array(df)
df.info
df.describe
df.sum
df.mean
df.cumsum
np.mean(df)
np.log(df)
np.sqrt(abs(df))

![[Pasted image 20250414112934.png]] ![[Pasted image 20250414113505.png]] ![[Pasted image 20250414114632.png]]

The `Series` Class

A series object is a single column of data from within a DataFrame object.

Key takeaways

#code #syntax #pandas

Instantiation: s = pd.Series(np.array); s = df['col']
Basic DataFrame methods apply to Series objects as well.
comparison/logical operators can be applied to DataFrame.
Which enables data selection by complex condition, like:
- df['col'] <>&| returns bools
- df[df['col'] <>&|] or df.query('criteria_that_returns_bool')returns values that match criteria
- Very important: df.query()
- df.append
- df.concat
- df.join(concat details with the same index values, defined by the how flags: left(append the 2nd df to the 1st), 'right'(appending the 1st df to the 2nd), 'inner'(finding least common factor), 'outer'(finding greatest common multiple))
- df.merge(df1,df2,on='somthing_common')

Groupby

Instantiation: groups = df.groupby('col')
Selection of data: groups = df.groupby(['col1', 'col2', ...])

Performance

TL;DR: #KIM

Working with the columns (Series objects) directly is the fastest approach. By “directly”, meaning no callables, no iterables, just plain df['col'] or whatever.
np.ndarray is faster than pd.DataFrame
calling apply using lambda functions or other things looping over all data entries in a df is almost ALWAYS the SLOWEST approach.

Conclusion

Pandas is a central tool in the PyData ecosystem, offering the DataFrame class for efficient tabular data manipulation. It supports vectorized operations for concise, high-performance code and provides robust handling of incomplete datasets. Pandas will be extensively utilized in subsequent chapters, introducing additional features as needed.

Chap6: OOP

TL;DR

#KIM

OOP is the go-to approach in finance.
OOP is suited for abstracted problems, like finance. Which is more intuitive for brain. More formatted, human-readable, less complex…
GPT summarized: Object-Oriented Programming (OOP) aligns with natural human thinking, reduces complexity, and enables modular, abstract, and reusable code. It supports features like inheritance, encapsulation, polymorphism, aggregation, and composition—enhancing flexibility, maintainability, and user interface design. OOP is also the dominant paradigm in Python, promoting nonredundant, efficient software development.
All objects in Python has attributes, methods, etc.

Glossary: #KIM

Class: A group of objects or designs.(Human)
Object: An instance of a class.(Juan Pan)
Attribute: A feature of the class.(Juan has dark eyes)
Method: A function of the class.
Parameters: Input for the method.
Instantiation: the process of trading a specific object based on an abstract class.

E.g.:

class HumanBeing(object):  
    def __init__(self, first_name, eye_color):  
        self.first_name = first_name  
        self.eye_color = eye_color  
        self.position = 0  
    def walk_steps(self, steps):  
        self.position += steps

Syntax

#syntax #code Attributes are like variables for different scopes.

![[Pasted image 20250414233922.png]] ![[Pasted image 20250414234022.png]]

Python Data Model(Very Important)

The Python Data model allows one to design classes that consistently interact with basic language constructs of Python, including:

Iteration
collection handling
attribute access
operator overloading
function and method invocation
object creation and destruction
string representation
managed contexts

#syntax #code Special methods(attributes with leading __ are private parameters that can not be access by the user):

__init__
__repr__
__add__
__mul__
__bool__
__len__
__getitem__
__iter__: returns an iterable

Conclusion

#code #syntax ![[Pasted image 20250414235631.png]] This chapter introduces object-oriented programming (OOP) in Python, highlighting its theoretical foundations and practical applications. OOP enables the modeling of complex systems through custom objects that integrate seamlessly with Python’s flexible data model. While some critique OOP, it offers powerful tools for managing complexity and abstraction. The derivatives pricing package discussed in Part V exemplifies a scenario where OOP is the most suitable paradigm to address intricate requirements.

Part 3 Getting Start with Quant

Index:

Chap7: plotting
chap8: using pandas handle time-series data.
chap9: I/O right and fast
chap 10: code performance
chap 11: math
chap 12: implement methods from stochastics.
chap 13: statistical and machine learning approach.

Chap 7: Visualization

Tools:

matplotlib
ploty

KIM:

matplotlib can parse ndarray objects(to a point.)
Use styles.

![[Pasted image 20250415135738.png]] #syntax Switches and readability:

plt.xlim(min of x axis)
plt.ylim(min of y axis)
plt.title
plt.xlabel
plt.ylabel
pass color as an argument: plt.plot('color_code')
plt.legend()

![[Pasted image 20250415140051.png]] ![[Pasted image 20250415140159.png]] ![[Pasted image 20250415140637.png]]

Going 2D & 3D

#KIM Be mindful of:

Scaling
- 1st Approach: use two y-axes(left/right, share the same x-axis)
- 2nd approach: use two subplots(upper/lower, left/right)
Separate styles
matplotlib can parse sub-datasets, but to a point.
Sometimes, visualize in different ways simultaneously is necessary.
Line and point plots are the most important ones in finance.
Scatter plots can be used to compare the returns of two assets.
Important: Histogram can be used to visualize financial returns.
3D plots can be used to visualize volatility surfaces.

Interactive

Dependent modules:

plotly
cufflinks
Use styles and plotting types. #syntax #code cf’s useful methods:
**add_annotations**
**add_atr**
**add_bollinger_bands**
**add_cci**
**add_dm**
**add_ema**
**add_macd**
**add_ptps**
**add_resistance**
**add_rsi**
**add_shapes**
**add_sma**
**add_support**
**add_trendline**
**add_volume**
iplot

Conclusion

Good practices:

Always consult the gallery #matplotlib first, then start with the example code.

A lot of #resources:

Chap 8: Financial Time Series

#KIM :

Financial time series data is one of the most important types of data in finance.
Time, time, time.

But first, DATA.

#KIM :

Thomson Reuters (TR) Eikon Data API
- Reuters Instrument Codes (RICs)
Good #practices:
- First, import data from csv.
- Take a first look at the data(inspecting / visualizing)
  - data.head()
  - data.tail()
  - data.plot()
- Have some basic statistics, check data validity, etc.:
  - data.info()
  - data.describe()
  - data.mean()
  - data.aggregate(min, max)
  - df.roudn()
  - np.mean
  - np.std
  - np.median
- Changes over time:
  - data.diff()[.head()]: absolute changes in value.
  - data.pct_change()[.round(<decimal_places>)[.head]]: Usually percentage changes are preferred.
  - rets = np.log(data/data.shift(1)): Log returns are preferred as well.
  - #KIM : Check log returns before any analysis happens.
- Resampling:
  - Downsampling: Tick data resampled to 1minute intervals OR Daily’s intervals to Monthly’s.
- Rolling statistics A.K.A financial indicators #Extremely_important for technical analysis.
  - data.rolling[.min]
  - data.rolling[.max]
  - data.rolling[.std]
  - data.rolling[.median]
  - data.ewm[.mean]
  - custom indicators using .apply method.
SMA strat: long when shorter-term SMA is above the longer-term SMA and vice versa, meaning:
- 1. Like a flip switch, trades only take place when the two SMA lines intersect.
- 1. Only a few trades will happen over the years.
- 1. SMAs are used to derive positions to implement a trading strat, it’s a means to an end.
Correlation: S&P v.s. VIX as an example: Choose plots with different scalings when dealing with this kind of problem(Correlation)
- #KIM Correlation is NEVER Causation.
OLS Regression(Linear Regression):
- reg = np.polyfit(rets['.SPX'], rets['.VIX'], deg=1)
- ax = rets.plot(kind='scatter', x='.SPX', y='.VIX', figsize=(10, 6))
- ax.plot(rets['.SPX'], np.polyval(reg, rets['.SPX']), 'r', lw=2);
- pd.DataFrame.corr: Compute pairwise correlation of columns, excluding NA/null values.

This chapter introduces financial time series—crucial datasets in finance—and highlights how the pandas library facilitates their analysis. Pandas offers efficient tools for analyzing, visualizing, importing, and exporting time series data across various formats. These capabilities are further demonstrated in the following chapter.

Chap 9

#context:

I/O is OFTEN, if not always the bottleneck of data analysis.
Data has to be read and performed in memory, results have to be written to disk.
Analytic data less than 1GB is a sweet spot for Python.
Use pickle package.
pickle follows FIFO principle, which is difficult for human to read. So store it in dict with some keys.
pickle is essentially a third-party package. If version changes compatibility issue may rise. So, consider the built-in R/W of numpy and pandas.
pandas can read from a lot of formats. ![[Pasted image 20250416165202.png]]

Basics

pickle.dump()
pickle.load()

Good practices #KIM :

Use numpy’s built-in save/load, it’s faster than SQL or pickle.
use PyTables and use h5s STRUCTURE.
The package name is PyTables, the import name is tables.
Use as few as for loops as possible(a last resort)
pandas and PyTables suffices the performance needs for SQL-like querying.
Use compression PyTables provides never hurts.
Always use HDF5 with pandas.
More conveniently and specifically for finance, use TsTables created by Yeves(Or self-build based on it.)

Ranking STRUCTURES by I/O performance(top is the fastest):

h5s
np
SQL
pd.to_csv
pd.to_excel

Resources

Conclusion

While relational (SQL) databases handle complex data relationships effectively, array-based approaches using tools like NumPy native I/O, PyTables, or Pandas with HDF5 often provide significant performance advantages for finance and science applications dealing with array-centric data. TsTables is specifically highlighted for large time series datasets, particularly in write-once, read-many scenarios.

Regarding hardware, the text advises caution against automatically choosing cloud-based scale-out solutions. It suggests evaluating whether fewer, more powerful “scale-up” servers (with many cores, large memory, potentially GPUs/TPUs) might offer comparable or superior performance and cost-efficiency for specific analytics workloads, citing a Microsoft study.

Ultimately, the recommendation is to first thoroughly analyze the specific data analytics tasks required, and then make an informed decision on the optimal hardware (scale-up vs. scale-out) and software architecture, as these choices significantly impact performance.

Chap 10: All About Performance

#KIM Good Practices:

Use Vectorization(numpy)
Compile to binary
- Dynamically(numba, numba wins overall.)
- Statically(cython)
Multi-threading: Used upon different problems of the same type.

Pros and cons of each #KIM :

numpy uses vectorization, which considerably improves speed over standard Python, but might use more RAM.
numba works like a charm, but to a very specific use case.
cython uses less RAM, very fast, but is essentially c + python, which needs more effort to mod the code into c-like language.
Recursive algos has problems to recalculate the previous problem each time. Use a cache decorator can dramatically improve performance.
Always keep in mind of the stack overflow, TYPES in Python has bit limitations.

Prime numbers

Prime number algo is an important benchmark as well as encryption.

Fibonacci

A typical recursive problem.

Use iterative approach.
Use cache.
(Optional) Use Cython int128 TYPE.

The number Pi

Use Monte Carlo Simulation to calculate the PI.

#KIM :

Randomness consumes up a lot of RAM.
The methods for these algorithms work the similarly, if not the same as in the financial context.

Binomial Trees

I don’t get it.

Monte Carlo Simulation

#KIM :

MCS is an important numeric tool in finance.
Many alogs can be benefited from multiprocessing. MCS is a good case.
Usually, regular Python is fine. But in production, always apply the BEST solution, even though it means more efforts to be put.

Conclusion

Okay, considering the text on this Thursday afternoon here in Kunming:

Here’s a concise summary of the provided conclusion:

The Python ecosystem offers several accessible ways to significantly improve code performance. Key approaches include:

Using efficient Python idioms and paradigms, like vectorization, which often leads to more concise and faster code (though sometimes uses more memory).
Leveraging specialized high-performance packages such as NumPy and Pandas for array and DataFrame operations.
Compiling Python code using tools like Numba (dynamic compilation) or Cython (static compilation), particularly effective for financial algorithms.
Parallelizing code execution, commonly achieved using the multiprocessing package on a single machine, with further options available for cluster computing.

A major advantage highlighted is that these performance techniques are generally easy to implement using existing libraries, often representing readily achievable gains (“low-hanging fruit”).

Resources

Chap 11 On math

The function used in this chap:

$$ f(x) = sin(x) + \frac12 x $$ Code:

def f(x):
	return np.sin(x) + 0.5 * x

Regression approach

Regression is basically use a bunch of basis functions then calculate the best parameters for these functions to approximate the example function.

Code:

import numpy as np
def f(x):
	return np.sin(x) + 0.5 * x
res = np.polyfit(x, f(x), deg=1, full=True) # deg=1 means linear.
def create_plot(x, y, styles, labels, axlabels):
	plt.figure(figsize=(10, 6))
	for i in range(len(x)):	
	plt.plot(x[i], y[i], styles[i], label=labels[i])
	plt.xlabel(axlabels[0])
	plt.ylabel(axlabels[1])
	plt.legend(loc=0)

#KIM :

You can approximate by adjusting degrees.
Also by adjusting basis functions.
Regression can cope well with noises in the data.
Regression can also cope with unsorted data.
Regression works with N-Dimensions without any dramatic change in code.
Implementation is easy.

Interpolation

Basically is to find spline(a best fitting curve with continuous derivatives) across datas.

#KIM :

Spline interpolation is often used in finance.
Limited to low dimension problems.
Require sorted data.

Code:

import scipy.interpolate as spi
x = np.linspace(-2 * np.pi, 2 * np.pi, 25)

![[Pasted image 20250417135209.png]]

Convex optimization

#KIM :

Convex is important.
Generally, find global optimization before locals(local minima can be multiple and the algo can be trapped in a minima.)
Crucial #Extremely_important : To be in the know which optimization for what problem.

Global optimization by Brute Force

Adjusting step size can considerably help improve the accuracy of the result.

E.g.: let fm be a 2d function: #syntax

def fm(p):
	x, y = p
	return (np.sin(x) + 0.05 * x ** 2
	+ np.sin(y) + 0.05 * y ** 2)
sco.brute(fo, ((-10, 10.1, 5), (-10, 10.1, 5)), finish=None) # step size is 5..
sco.brute(fo, ((-10, 10.1, 0.1), (-10, 10.1, 0.1)), finish=None) # step size is 0.1.

Local optimization

Code: #syntax

sco.fmin(fo, opt1, xtol=0.001, ftol=0.001, maxiter=15, maxfun=20)# function to be minimized, starting parameter values, input parameter tolerance, function value tolerance, max number of iterations, function calls.

Constrained optimization

Code #syntax :

cons = ({'type': 'ineq','fun': lambda p: 100 - p[0] * 10 - p[1] * 10})# constraint
bnds = ((0, 1000), (0, 1000))# bounds
result = sco.minimize(Eu, [5, 5], method='SLSQP',bounds=bnds, constraints=cons)# Eu is the function to be optimized.

Integration

Applies the most to valuation and pricing.

Code for plotting:

fig, ax = plt.subplots(figsize=(10, 6))
	plt.plot(x, y, 'b', linewidth=2)
	plt.ylim(bottom=0)
	Ix = np.linspace(a, b)
	Iy = f(Ix)
	verts = [(a, 0)] + list(zip(Ix, Iy)) + [(b, 0)]
	poly = Polygon(verts, facecolor='0.7', edgecolor='0.5')
	ax.add_patch(poly)
	plt.text(0.75 * (a + b), 1.5, r"$\int_a^b f(x)dx$",
		horizontalalignment='center', fontsize=20)
	plt.figtext(0.9, 0.075, '$x$')
	plt.figtext(0.075, 0.9, '$f(x)$')
	ax.set_xticks((a, b))
	ax.set_xticklabels(('$a$', '$b$'))
	ax.set_yticks([f(a), f(b)]);

Code for computing:

sci.fixed_quad(f, a, b)[0]
sci.quad(f, a, b)[0]
sci.romb(f)
xi = np.linspace(0.5, 9.5, 25
sci.trapzoid(f(xi), xi)
sci.simpson(f(xi), xi)

By simulation:

for i in range(1, 20):
	np.random.seed(1000)
	x = np.random.random(i * 10) * (b - a) + a
	print(np.mean(f(x)) * (b - a))

Symbolic Computation

#KIM :

use SymPy
SymPy auto simplifies math expression.
SymPy has 3 kinds of engine:
- Latex
- Unicode
- Ascii
Can prettify print math expressions.
Valuable to financial math.

SymPy basics #syntax :

import sympy as sy
x = sy.Symbol('x')
y = sy.Symbol('y')
sy.sqrt(x)
f = x ** 2 + 3 + 0.5 * x ** 2 + 3 / 2
sy.simplify(f)

Equations, Integration and Differentiation

use sy.solve to solve simple equations.

E.g. solving integration and differentiation:

![[Pasted image 20250417163707.png]] ![[Pasted image 20250417164050.png]]

Conclusion

This chapter introduces four key mathematical topics and tools relevant to finance:

Function Approximation: Important for applications like factor models, yield curve interpolation, and regression-based Monte Carlo methods for American options.
Convex Optimization: Frequently used in finance for tasks such as calibrating option pricing models to market data or implied volatilities.
Numerical Integration: Central to pricing options and derivatives, often involving calculating the discounted expected payoff under a risk-neutral measure (linking to stochastic process simulation covered in Chapter 12).
Symbolic Computation (using SymPy): Highlighted as a potentially useful and efficient tool for specific mathematical operations like symbolic integration, differentiation, and solving equations.

Resources

Chap12 Stochastics

#glossary Extremely simplified:

Stochastic: a sequence of random variables, wherein a draw is dependent on the previous draw.
Markov property: tomorrow’s value of the process only depends on today’s state.

#KIM :

MCS is among #Extremely_important THE MOST IMPORTANT numerical techniques in finance.
Choose wisely the TYPES, STRUTURES as well as algos to tackle different type of problems.
Important for valuation.
Important for risk management.

Random numbers

Code:

import numpy.random as npr
npr.seed(100)
npr.rand(10)

![[Pasted image 20250417172252.png]] ![[Pasted image 20250417172449.png]] ![[Pasted image 20250417172514.png]]

BSM model

![[Pasted image 20250417173531.png]]

GBM

![[Pasted image 20250417175731.png]] ![[Pasted image 20250417175812.png]]

Square-root diffusion

![[Pasted image 20250417180115.png]]

Heston volatility

![[Pasted image 20250417220914.png]]

Get dizzy about models?

Here’s to demystify all these models:

My notes on Quant Finance

jump Diffusion

#book Book page 369

![[Pasted image 20250417223558.png]]

You don’t have to understand them all at first sight

Scan through first.
Refer to book Chap 12 repeatedly.
Learning by practicing.

VaR

A risk measure widely adopted among industries and practitioners.

CVaR and CVA

Other risk measures.

Conclusion

This chapter focused on Monte Carlo simulation techniques for finance. It explained how to generate pseudo-random numbers according to different distributions and how to simulate the random variables and stochastic processes crucial in financial modeling.

Two major application areas were detailed:

Valuing both European and American options.
Estimating risk measures like Value-at-Risk (VaR) and Credit Valuation Adjustments (CVA).

The text concludes that Python combined with NumPy is well-suited for these often computationally demanding simulations. This effectiveness stems from NumPy’s C-based implementation providing significant speed advantages over pure Python, and its support for vectorized operations leading to more compact and readable code.

Chap 13 Statistics

#KIM #Extremely_important :

Portfolio theory: When the returns are normally distributed, the best portfolio composition only relate to:
- 1. mean return;
- 2. variance of the returns;
- 3. covariance of the returns.
Capital asset pricing model: When the returns are normally distributed(an observed conclusion, returns of an individual asset is to be aligned with Gaussian Distribution), the price of a single stock is in linear relationship with the market index.
Efficient market: Prices fluctuates randomly and returns are normally distributed.
Option pricing theory: BSM model with geometric Brownian motion.

Normality tests

Benchmarks:

Skewness test
Kurtosis test
Normality test

#KIM :

Real world data often, if not always render fat tails(Jump diffusion).

MPT(Modern Portfolio Theory)

#KIM :

MPT is a cornerstone in economics.
MPT’s important factors are:
- Assumes normal distribution(of a single asset.)
- Assumes Mean and Variance are the only necessary and sufficient statistics.
- Goal of MPT is to maximize the return, while minimizing the risk.
- Covariance Matrix is central when selecting asset objects.
- The theory does not allow shorts. Only longs, and all the longs add-up to 100%.
- Mean-variance portfolio selection:
  - Expected portfolio variances
  - Expected portfolio return formula(Utility Index): $$ \mu=E(\Sigma_{I}w_ir_i) $$
#KIM Sharpe ratio formula:

$$ SR=\frac{\mu_p(-r_f)}{\sigma_p} $$

$\mu_p$ is utility index.
$r_f$ risk-free rate.
$\sigma_p$ standard deviation.
#KIM: In a dollar-neutral portfolio, $\mu_p = Excess Return + r_f$, essentially renders the factor to just the $\mu_p$, which is the return given by stock position. The same works for long-only strategy.
#KIM : Generally, $r_f$ is usually 0 as long as there is no financing cost.
#KIM you can either fix a target return level and find minimal volatility or set a minimal volatility and find max return level and all the optimal portfolios comprise an efficient frontier $$ Sharpe Ratio = \frac{(Money the Investment Made - Money the Piggy Bank Made)}{(How Much the Investment Bounced Around)} $$

Key Takeaways

most models, if not all the finish models like MPT or CAPM respond the assumption that returns of securities are normally distributed
Test for normality is important
when the stock returns are normally distributed, optimum portal choice can be casted into a setting where mean return variance of the returns coparents between different stocks are relevant for an optimal portfolio composition
when stock returns are normally distributed prices of single stocks is in linear relationship to a market index.
[[Compounds, integration, log-returns]]

Bayesian Statistics

What it is

In essence: Bayesian statistics is to update our prior probability (initial belief) about something into a posterior probability (updated belief) after considering what actually happened (the data).

Implementation

Requires pymc package.

Why Random Walk

Refer to: This note.

Machine Learning

Unsupervised

k-means (These data BELONG to this subset.)
Gaussian Mixture (These data are xx% likely belong to this subset.)

Codifying: a general way:

Import model class
Instantiating model object
Fit model to data
Predict the outcome(Clusters)

Supervised

Focuses:

Classification problem
Estimation problem Algos:
Gaussian Naive Bayes
Logistic Regression
Decision Trees
Deep Neural Networks
Support Vector Machines

#KIM Refer to This to learn more.

Part 4 Algo Trading

Chap14 API

Generalization of workflow:

Get Started with Accounts and APIs
Data Retrival:
- Tick Data
- Candles Data
- Historical Data
- Streaming Data
- Placing Orders
  - Buy
  - Sell
- Manage Account
  - Get balance
  - Get Margin
  - …

Chap 15 Trading Strats(Algo Trading)

Backtesting
- Buy-and-Hold Benchmark
- RWH and EMH Benchmark
- Train/Test Splits(Avoid overfitting)
  - Sequential
  - Randomized
Strats
- SMA(Simple Moving Average)
- Regression Methods: OLS Regression
- ML Methods
  - Classification
    - Log Regression
    - Gaussian Naive Bayes
    - Support Vector Machines(SVMs)
  - Clustering
    - k-means cluster
  - DNNs
- Frequency Approach

Chap 16 Automate

Get hands-on with APIs and some basic operations:

Retrieve data
- Historical
- Streaming
Place orders
- Buys
- Sells
Check account status #KIM :
Vectorized backtesting only tests to a point.

Kelly Criterion

Finding the best fraction/leverage for the capital to be traded with.

To be more dovish, go for Half Kelly. As Full Kelly will aim for a very high(but de facto “Best”) leverage and induce more risk.

More #KIM to be found on my notion notes about principles, KIMs, and experiences.

The optimal fraction/leverage:

$$ f^*=\frac{\mu-r}{\sigma^2} $$ $f$: The optimal fraction/leverage. $\mu$: Expected return of the stock. $\sigma$: The standard deviation of returns(volatility). $r$: Constant short rate, default to $0$.

Risk Analysis

Drawdown
VAR(Value-at-Risk)

Refer to my notion notes:

What do they mean

Resources

Conventions Used in This Book

Prep for coding

Chap 1 Why Python for Finance

About Virtual Environment

Part 1

Chap 1 Intro

Key takeaways:

Conclusion

Chap 2 Deployment

Getting familiar with conda

Getting familiar with docker

Build an Ubuntu Python docker image

Notes on colima, docker, shell, conda, and python

conda basics

shell basics

colima basics(Mac specific)

docker basics

Going Cloud

Conclusion

Part 2 On to Python

Chap 3 Data Types and Data Structures

Floats

How to address this issue

Bool

Good Practices

Strings

print()

regex (Regular Expression)

Good Practices

Structures

Control Structures

List comprehension in Python

Good Practices

Functional programming

Function Definition

Tools for functions

Anonymous function: lambda

filter

Good Practices

dicts

sets

Conclusion

Chap4: NumPy

Arrays

Get started with array

numpy arrays

Basics:

What does it mean by universal:

non-vectorized vs vectorized

math vs numpy

np.exp() vs **

Multiple dimensions

np.linspace

Key Takeaways(for ndarray objects):

Conclusion

Chap5 pandas

Key Takeaways:

The Series Class

Key takeaways

Groupby

Performance

Conclusion

Chap6: OOP

TL;DR

Syntax

Python Data Model(Very Important)

Conclusion

Part 3 Getting Start with Quant

Chap 7: Visualization

Going 2D & 3D

Interactive

Conclusion

Chap 8: Financial Time Series

But first, DATA.

Chap 9

Basics

Resources

Conclusion

Chap 10: All About Performance

Getting familiar with `conda`

Getting familiar with `docker`

Build an Ubuntu Python `docker` image

Notes on `colima`, `docker`, `shell`, `conda`, and `python`

`conda` basics

`shell` basics

`colima` basics(Mac specific)

`docker` basics

`print()`

`regex` (Regular Expression)

`List` comprehension in Python

Anonymous function: `lambda`

`filter`

`dict`s

`set`s

`numpy` `array`s

What does it mean by `universal`:

`math` vs `numpy`

`np.exp()` vs `**`

`np.linspace`

Key Takeaways(for `ndarray` objects):

Chap5 `pandas`

The `Series` Class