Resources
The Quant Platform website: http://py4fi.pqp.io
- Our company website http://tpq.io
- My private website http://hilpisch.com
- Our Python books website http://books.tpq.io
- Our online training website http://training.tpq.io
- The Certificate Program website http://certificate.tpq.io
- Training program: http://pyalgo.tpq.io
The AI Machine: http://aimachine.io
Conventions Used in This Book
Italic: for terms, URLs, email addresses.
Monospace: for deliberately for technos.
Monospace and italic: for user-defined values.
![[Pasted image 20250303205954.png]]
Prep for coding
Creating an ad-hoc conda environment for this project.
Supplemental material (in particular, Jupyter Notebooks and Python scripts/modules) is available for usage and download at http://py4fi.pqp.io.
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
Importing means making a package available to the current name‐ space and the current Python interpreter process.
Must familiarize with Numpy ndarray and pandas DataFrame data structure.
- Finance firms are spending trillions on tech.
- Tech can go wild.(2010 Flash Crash)
Chap 1 Why Python for Finance
Context w/ modern finance:
- Firms have to power tech to be competitive, millions and millions;
- More and more tech savvy(Data processing, analytic speed, theoretical foundations.)
- Real-time analytics
About Virtual Environment
Course code repo: https://github.com/yhilpisch/py4fi2nd
Solutions:
- Create an ad-hoc
venvorcondaenvironment for the project, on Linux distro. Regarding virtual environment, container, Anaconda: https://g.co/gemini/share/b853fe5cdef1 - Use a MacBook, then follow: https://github.com/yhilpisch/py4fi2nd
My MacBook use guide link here
Part 1
Chap 1 Intro
Key takeaways:
- Naming conventions in Python resembles that of the real world. This characteristic makes Python the efficient choice.
- High abstraction and rigid implementation.
- Firms might implement Python from prototyping to production.
- Important theories like MPT and CAPM lack of data-driven support, most by experience.
- Outdated theories rely on weak assumptions.
General tips to improve efficiency with Python:
- Use a simpler approach(less loops, less vectorization)
- Use specialized packages to handle data
- Use parallelization.
A proper treatment of AI-first finance, however, would require a book fully dedicated to the topic. This book only provied an entry level of understanding of how to apply AI to finance.
Conclusion
Python, with its elegant syntax, efficient development approaches, and versatility for both prototyping and production, stands as an ideal technological framework for the financial industry. Its extensive ecosystem of packages, libraries, and tools addresses the challenges posed by recent developments in finance, including analytics, data management, compliance, and technology. Python streamlines end-to-end development and production, and its dominance in AI, machine learning, and deep learning makes it the go-to language for data-driven and AI-first finance, which are reshaping the financial industry.
Chap 2 Deployment
This chapter covers the techniques with python deployment:
- package managers
- Virtual environment managers
- containers
- cloud instances
Getting familiar with conda
#syntax #code
conda can server as a package manager as well as an environment manager.
Getting familiar with docker
A Docker container is an isolated filesystem containing an OS (e.g., Ubuntu), Python runtime, tools, and libraries. It runs uniformly across platforms (e.g., Windows 10 or cloud Linux).
Build an Ubuntu Python docker image
Prep:
apt-getupdates- Install
conda,pythonand other OS necessary packages usinginstall.sh - Install
docker,docker-compose
Notes on colima, docker, shell, conda, and python
Scope:
- For
py4fiexample illustration purposes only.
KIM:
condacan manage packages and environments.dockercan instantiate an environment in acontainer.- You can build a pre-defined image using
docker build.
conda basics
#syntax
- Install packages:
conda install <name0> <name1> ... -y - Search packages:
conda search <name> - Update packages:
conda update <name> - Remove packages:
conda remove <name> - List packages:
conda list - Create a virtual env:
conda create -n $ENVIRONMENT_NAME - Activate a virtual env:
conda activate <env_name> - Deactivate a virtual env:
conda deactivate - Remove a virtual env:
conda env remove -n <env_name> - List virtual envs:
conda env list - Export environment configs:
conda env export > $FILE_NAME - Create a virtual env from a config file:
conda env create -f $FILE_NAME
shell basics
#syntax #code
- Check
conda:conda --version - Check
docker:docker --version - Check
python:python --version
Package management for debian like OS:
- Update packages:
apt-get update; apt-get upgrade -y - Install packages:
apt-get install -y $PACKAGE_NAME
colima basics(Mac specific)
#syntax #code
Goal: To have a minimalist working docker daemon on MacOS.
Using colima on MacOS to render a minimalist, vanilla docker daemon.
- Installation:
brew install colima - Create a
dockerdaemon usingcolima:colima start -e #to edit demanding configs like disk usage, memory assignment, cpu assigment, etc. - After
colima started,dockerdaemon is up and running, onward todockermanipulations. - Stop daemon:
colima stop - Remove a daemon:
colima delete <name>
Noteworthy configs:
- Disk usage
- Memory assignment
- CPU assignment
- Arch
- Mount volume type
- Mount point
- Virtualization framework
docker basics
#syntax
- List images:
docker images - List all containers:
docker ps -a - List running containers:
docker ps - Run a container based on an image:
docker run -it <image_name>:<release_tag> - Start a container:
docker start <name_or_id> - Stop a container:
docker stop <name_or_id> - Attach the session to a RUNNING docker container:
docker attach <name_or_id> - Remove a container: `docker rm <name_or_id>``
Into the docker shell
#syntax
Goal: To install all necessary and sufficient packages for a python deployment(mainly conda and other python packages.)
- List app running containers:
docker ps - List all containers:
docker ps -a sshinto a simple container created withubuntu:latestimage:docker run -ti -h py4fi -p 11111:11111 ubuntu:latest /bin/bash- After
sshed into the docker container, the rest is just unix. - Fetch
miniconda.shaccording to thearch, link here. - Initialize
condausing fetchedminiconda.shinstallation. - Install some example packages like
numpy,scipy,ipython,pandas, etc.
- After
Building your own image
In an ad-hoc docker build directory, create two files, like shown in the book:
juan@juans-MacBook-Air docker-build % tree
.
├── Dockerfile
└── install.sh
1 directory, 2 files
Note:
- You are just creating an image, do not run the scripts in the host.
- But images are created in the host.
- Just
cdinto the directory and rundocker build ....
Basic manipulations:
- Build an image:
docker build -t py4fi:basic . - Remove an image:
docker rmi <name_or_id_of_the_image>
Going Cloud
Providers to consider:
- DigitalOcean
- AWS
Goal:
- Apply the above to a cloud infrastructure.
- Setup an online jupyter notebook.
- Learn basic SSL(Namely OpenSSL)
- Developing the Python deployment through any browser.(Maybe even phone?)
Prep:
- User book provided scripts to initialize the deployment(Digital Ocean Droplet)
- Get a VPS instance
- Refer to the official Jupyter Notebook Docs to deploy(This pretty much sumps up everything need to know, and very, very, very easy and quick to deploy.)
- if using the codes from the book, just follow it, basic steps as follows:
- Creating RSA keys:
openssl req -x509 -nodes -days 365 -newkey rsa:1024 -out cert.pem -keyout cert.key - Follow onscreen instructions
- Generate a hash protected password using Python built-in package:
passwd('replace_with_an_actual_password') nanothe jupyter notebook config file.nanothe installation script.- DigitalOcean Droplet orchestrate set up script.
- Creating RSA keys:
How to use the book code:
cdinto../ch02/cloud- Gen a
cert.keyandcert.pemusing the book providedopensslcode. - Use a the
old_notebook_envto get a hashed password like the example in the book.
KIM:
- Use the security measure provided by Jupyter Notebook(A.K.A. JupyterLab, JupyterHub).
- Follow the official guide to deploy JupyterHub, then onward to the book’s installation script. #syntax #code
[!code]- Click to show installation script
#!/bin/bash # # Script to Install # Linux System Tools, # Basic Python Packages and # Jupyter Notebook Server # # Python for Finance, 2nd ed. # (c) Dr. Yves J. Hilpisch # # GENERAL LINUX apt-get update # updates the package index cache apt-get upgrade -y # updates packages apt-get install -y bzip2 gcc git htop screen vim wget # installs system tools apt-get upgrade -y bash # upgrades bash if necessary apt-get clean # cleans up the package index cache # INSTALLING MINICONDA wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O \ Miniconda.sh bash Miniconda.sh -b # installs Miniconda rm Miniconda.sh # removes the installer # prepends the new path for current session export PATH="/root/miniconda3/bin:$PATH" # prepends the new path in the shell configuration echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc echo "conda activate" >> ~/.bashrc # INSTALLING PYTHON LIBRARIES # More packages can/must be added # depending on the use case. conda update -y conda # updates conda if required conda create -y -n py4fi python=3.7 # creates an environment source activate py4fi # activates the new environment conda install -y jupyter # interactive data analytics in the browser conda install -y pytables # wrapper for HDF5 binary storage conda install -y pandas # data analysis package conda install -y matplotlib # standard plotting library conda install -y scikit-learn # machine learning library conda install -y openpyxl # library for Excel interaction conda install -y pyyaml # library to manage YAML files pip install --upgrade pip # upgrades the package manager pip install cufflinks # combining plotly with pandas
Conclusion
Key Python deployment solutions for finance include:
- Conda Environments
- Create project-specific environments (
py4fi2nd.yml) for dependency isolation and reproducibility.
- Create project-specific environments (
- Docker Containers
- Containerize environments to ensure consistency across development/production stages.s
- Cloud Infrastructure
- Leverage platforms like DigitalOcean for scalable, real-time analytics and code execution.
Part 2 On to Python
• Chapter 3 focuses on Python data types and structures.
• Chapter 4 is about NumPy and its ndarray class.
• Chapter 5 is about pandas and its DataFrame class.
• Chapter 6 discusses object-oriented programming (OOP) with Python.
Chap 3 Data Types and Data Structures
“Types” and “Structures” are not the same.
“TYPES” from onward will be referring to “Data Types” in this chapter.
“STRUCTURES” from onward will be referring to “Data Structures” in this chapter.
Basic Types: ![[Pasted image 20250401221249.png]]
Basic structures: ![[Pasted image 20250401221434.png]]
Good practices:
- Use
type()to check types. - Use
ipythonor other intellisense auto-completion to check for functions, classes, methods and the like. - Use
dir()to check the complete list of attributes and methods of any object.
Floats
Fun but actually important fact: Floating numbers are essentially represented as binary formats in memory in Python( Python dynamically interprets data types at runtime and assign bits to them in memory), and when the floating number is less than one and bigger than zero, it will have an abundance of, if not infinite binary bits to represent them. IF this floating number is de facto assigned with a fixed bits, but it can NOT be represented in BINARY finitely, inaccuracies will occur. So, you might see this in ipython: ![[Pasted image 20250401222431.png]] The issue can be of importance when summing over a large set of numbers. In such a situation, a certain kind and/or magnitude of representation error might, in aggregate, lead to significant deviations from a benchmark value.
How to address this issue
Use the decimal package to specifically handle floats if accuracy is a priority.
Basic example: #syntax #code s
[!code]- python Click to show basic example with
decimalimport decimal from decimal import Decimal decimal.getcontext() Decimal(1)/Decimal(11) decimal.getcontext.prec() = 4 Decimal(1)/Decimal(11) decimal.getcontext.prec() = 50 Decimal(1)/Decimal(11)
Bool
- Comparison operators:
<,>,≤,≥,≠,=: yields bool - Logic Operators:
and,or,not: yields bool - Any
non-0are yieldsTrue.
>>> bool(0.0)
False
>>> bool(4214.2)
True
>>> bool(-432)
True
Good Practices
- Play with it.
Strings
KIM:
- Any
typein Python is an object, meaning: any object has itsclasses andmethods to call upon.
E.g.: #code
>>> a_string = f"This is a string."
>>> a_string.strip(" his")
'This is a string.'
>>> a_string.strip(" T")
'his is a string.'
>>> a_string.strip(" t")
'This is a string.'
>>> a_string.replace(" ", "-")
'This-is-a-string.'
>>> a_string.
a_string.capitalize() a_string.index( a_string.isspace() a_string.removesuffix( a_string.startswith(
a_string.casefold() a_string.isalnum() a_string.istitle() a_string.replace( a_string.strip(
a_string.center( a_string.isalpha() a_string.isupper() a_string.rfind( a_string.swapcase()
a_string.count( a_string.isascii() a_string.join( a_string.rindex( a_string.title()
a_string.encode( a_string.isdecimal() a_string.ljust( a_string.rjust( a_string.translate(
a_string.endswith( a_string.isdigit() a_string.lower() a_string.rpartition( a_string.upper()
a_string.expandtabs( a_string.isidentifier() a_string.lstrip( a_string.rsplit( a_string.zfill(
a_string.find( a_string.islower() a_string.maketrans( a_string.rstrip(
a_string.format( a_string.isnumeric() a_string.partition( a_string.split(
a_string.format_map( a_string.isprintable() a_string.removeprefix( a_string.splitlines(
A brief list of methods for str() object:
print()
You can apply format strings to print().
regex (Regular Expression)
Good Practices
- Play with it.
Structures
list: More flexible(Most of the time, working ONLY withlistis sufficient.)tuple: More rigid(Immutable)dictset
KIM:
- Any structure in Python has a built-in index.
- The index uses 0-based indexing.
- Usually,
=means assign values,==means comparing values. #code
In [102]: l = [1, 2.5, 'data']
l[2]
Out[102]: 'data'
In [103]: l = list(t)
l
Out[103]: [1, 2.5, 'data']
In [104]: type(l)
Out[104]: list
In [105]: l.append([4, 3])
l
Out[105]: [1, 2.5, 'data', [4, 3]]
In [106]: l.extend([1.0, 1.5, 2.0])
l
Out[106]: [1, 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]
In [107]: l.insert(1, 'insert')
l
Out[107]: [1, 'insert', 2.5, 'data', [4, 3], 1.0, 1.5, 2.0]
In [108]: l.remove('data')
l
Out[108]: [1, 'insert', 2.5, [4, 3], 1.0, 1.5, 2.0]
In [109]: p = l.pop(3)
print(l, p)
[1, 'insert', 2.5, 1.0, 1.5, 2.0] [4, 3]
Control Structures
help(range)
- Typically: the
forloop.fortypically used withlistobjects.
- Counter based loops(like
i = 0, while i < stuffin other languages) are implements in Python typically usingrangeobject. You could also achieve the same thing withwhile.helponrange:
class **range**(object)
| range(stop) -> range object
| range(start, stop[, step]) -> range object
List comprehension in Python
E.g.:
In [117]: m = [i ** 2 for i in range(5)]
m
Out[117]: [0, 1, 4, 9, 16]
![[Pasted image 20250408190130.png]]
Good Practices
- Play with it.
- Keep loops as minimal as possible(use embedded functions, methods,
lamda,map(), etc.)
Functional programming
Function Definition
def f(x[, argument1[, argument2...]...]):
return x
Tools for functions
#code
Help on map:
Help on class map in module builtins:
class **map**(object)
| map(function, iterable, /, *iterables)
|
| Make an iterator that computes the function using arguments from
| each of the iterables. Stops when the shortest iterable is exhausted.
Pay attention to the arguments of map.
E.g.:
In [120]: list(map(even, range(10)))
Out[120]: [True, False, True, False, True, False, True, False, True, False]
Anonymous function: lambda
Example with lambda:
lambda input: output_of_input
filter
E.g filter an iterator with even elements:
In [**13**]: list(filter(**lambda** x: x % 2 == 0, range(10)))
Out[**13**]: [0, 2, 4, 6, 8]
Good Practices
- Play with it.
- Keep loops as minimal as possible, even though it’s only implicit(use embedded functions, methods,
lamda,map(), etc.)
dicts
dictobjects- mutable, like
list - concept of key-value pair
- unordered(generally)
- not sortable(generally)
- defined using
{} - has built-in methods, like any other object.
Methods of dict:
#code
![[Pasted image 20250409130809.png]]
sets
- not too many applications(not typical)
- unordered
- collections of other objects
- trimmed elements(every element is unique)
- can be applied with math set theory
- one applications is to get rid of duplicates in a
listobject
Conclusion
- Basic data types:
int,float,bool, andstrserve as atomic types. - Standard data structures:
tuple,list,dict, andsetare widely applicable, withlistbeing particularly flexible for diverse financial use cases.
Chap4: NumPy
#numpy
numpy expands data structures to arrays.
Arrays
But first, key downsides of using list:
- high memory usage
- slow performance
For real applications, arrays prevails.
In the more common case, an array represents an i × j matrix of elements.
numpy is to specialize in arrays.
Get started with array
Python has a built-in array package that handles array with very truncated functionalities.
import array
A simple list object is considered a 1d array:
In [1]: v = [0.5, 0.75, 1.0, 1.5, 2.0]
A nested list objects(n-dimensional array…):
In [2]: m = [v, v, v]
m
Out[2]: [[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0],
[0.5, 0.75, 1.0, 1.5, 2.0]]
- Elements are interlinked by default, meaning: if you create a list object with another list object, the created list will mutate if the other list object mutated. To prevent this, use
deepcopymodule fromcopypackage. arrays has some basic built-in file operation functionalities(like store into a file, etc)arrays can be converted tolistif in need.
numpy arrays
Basics:
#code #numpy
In [28]: import numpy as np
In [29]: a = np.array([0, 0.5, 1.0, 1.5, 2.0])
a
Out[29]: array([0. , 0.5, 1. , 1.5, 2. ])
In [30]: type(a)
Out[30]: numpy.ndarray
In [31]: a = np.array(['a', 'b', 'c'])
a
Out[31]: array(['a', 'b', 'c'], dtype='<U1')
In [32]: a = np.arange(2, 20, 2)
a
Out[32]: array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])
In [33]: a = np.arange(8, dtype=np.float)
a
Out[33]: array([0., 1., 2., 3., 4., 5., 6., 7.])
In [34]: a[5:]
Out[34]: array([5., 6., 7.])
In [35]: a[:2]
Out[35]: array([0., 1.])
numpy.ndarrayhas a lot of built-in methods to provide insights on statistics, computation, manipulation, etc.- Operations executed upon
ndarraysusually are vectorized(Refer to: On Vectorization), which is more intuitive than vanillalistobjects. - with floats computation,
mathmodule beatsnumpy. numpymethods are universal, meaning they can be applied to the basic Python data types.
What does it mean by universal:
In [**1**]: **import** **numpy** **as** **np**
In [**16**]: b = [1,2,3]
In [**17**]: np.sqrt(b)
Out[**17**]: array([1. , 1.41421356, 1.73205081])
non-vectorized vs vectorized
![[Pasted image 20250409143616.png]]
math vs numpy
![[Pasted image 20250409143125.png]]
np.exp() vs **
In general, np.exp() means natural exponential by default, ** n raise to the power of n.
#code #numpy
In [**1**]: **import** **numpy** **as** **np**
In [**2**]: help(np.exp)
In [**3**]: a = [0, 25, 50, 75, 100]
In [**4**]: print(np.exp(a))
[1.00000000e+00 7.20048993e+10 5.18470553e+21 3.73324200e+32
2.68811714e+43]
In [**5**]: print(np.exp(a, 2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(np.exp(a, 2))
TypeError: return arrays must be of ArrayType
In [**6**]: help(np.exp)
In [**7**]: a ** 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 a ** 2
TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
In [**8**]: a = np.array(a)
In [**9**]: a
Out[**9**]: array([ 0, 25, 50, 75, 100])
In [**10**]: a ** 2
Out[**10**]: array([ 0, 625, 2500, 5625, 10000])
Multiple dimensions
![[numpy_ndarray.png]]
![[SCR-20250410-mjlt.png]]numpy’s dtypes:
![[Pasted image 20250410194734.png]]
np.linspace
Creates a one-dimensional ndarray object with evenly spaced intervals between numbers; parameters used are start, end, and num (number of elements).
Key Takeaways(for ndarray objects):
- Metainformation
np.array.size~.itemsize~.ndim~.shape~.dtype~.nbytes
- Reshaping
- KIM:
- Generally, reshaping only throws another view of the array;
- resizing creates a new, temp object
- You can dump the reshapes to a new/old object if that’s desired.
- Reshaping:
np.arange()np.array.shapenp.shape(np.array)np.array.reshape((row, col))- dump to a new variable:
new_arr = np.array.reshape((row, col))
- dump to a new variable:
- Transpose
new.arr.T # Trans rows to cols and cols to rows- `new_arr.transpose()
- KIM:
During a reshaping operation, the total number of elements in the ndarray object is unchanged. During a resizing operation, this number changes. it either decreases (“down-sizing”) or increases (“up-sizing”). Here some examples of resizing
- Resizing:
-
np.resize(arr, (new_row, new_col))
The size of the connecting dimension must be the same for stacking operations
- Stacking:
- horizontal stacking:
np.hstack((<ndarrays>, <operations>)) - vertical stacking:
np.vstack((<ndarray_objects>, <operations>))
- horizontal stacking:
Flattening is to reduce ND
ndarrayobject to a 1D object. It can happen row-by-row or col-by-col.
- Flattening:
h.flatten(order='C|F') # Set C against rows, F against cols.- The
.flatand.ravel(order="C|F")iterator scans element by element with the specific order.
Comparison and logical operators work on
ndarrayobjects element wise.Boolean arrays can be used for indexing and data selection.
- Boolean Arrays
[arr [,<|,>|,<=|,>=|,==][,&] arr]returns a view to represent the values inbool.- Important method:
np.where(<a_logical_statement>, <value_to_assign_when_true>, <value_to_assign_when_false>). Values waited to be assigned can beTYPES. bool_ndarray.astype(int) # To represent True|False to 1|0.
![[Pasted image 20250410230739.png]]
Speed:
numpygenerally wins, if not always.
Structured:
NumPyallow you to have a differentdtypePER column. #code
import numpy as np
# 2 Types to defining the dtype
# 1. Define explicitly
dt = np.dtype([
('Name', 'S10'), # 1. 'Name' field with string data type of maximum length 10
('Age', 'i4'), # 2. 'Age' field with 4-byte (32-bit) integer data type
('Height', 'f'), # 3. 'Height' field with default float data type (usually 32-bit)
('Children/Pets', 'i4', 2) # 4. 'Children/Pets' field with a shape (2,) array of 4-byte integers
])
# 2. Define in a more readable way
dt = np.dtype({'names': ['Name', 'Age', 'Height', 'Children/Pets'],
'formats':'O int float int,int'.split()}
# Notice `formats` as the key of the dtype, values are the dtypes to be defined.
# 3. Define in a more readable way, explicitly. Since `split` essentially returns a list.
dt = np.dtype({'names': ['Name', 'Age', 'Height', 'Children/Pets'],'formats':['O', 'int', 'float', 'int,int']}) # Notice the composite 'int,int' dtype.
data = np.array([
('Alice', 30, 5.5, [2, 1]),
('Bob', 25, 6.0, [0, 2])
], dtype=dt)
Define dtype in a funnier way(Does not work if there is only 1 field…):
In [**27**]: dt = np.dtype({
...: 'names': ['Name','/Age/Children/Pets/Houses/Income/Height'],
...: 'formats': ['O','int,int,int,int,int,float']
...: })
In [**28**]: s = np.array([('Smith', (45, 0, 2, 1, 100, 1.83))], dtype=dt)
In [**29**]: s
Out[**29**]:
array([('Smith', (45, 0, 2, 1, 100, 1.83))],
dtype=[('Name', 'O'), ('/Age/Children/Pets/Houses/Income/Height', [('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<i8'), ('f5', '<f8')])])
- Indexing w/
dtype:arr["Name_of_the_Col"].methodsarr[<the_index_of_rows>]arr[<row>][<"Col">]
Key Takeaways for dtype:
- It’s like a template to tell the to-be-assigned data, “We want this type of format.”
- Like a SQL database.
- Opens access for the data to be searched by names and indexes.
Vectorizations allows computation to be faster. Basic computations are treated as scalar manipulations(element-wise) by default. You can combine matrix with matrix(scalar computations) or matrix with numbers(linear transformation).
More on numpy’s broadcasting and it’s handling
Click to View
Okay, here's a concise conclusion summarizing the key points from our chat about matrices in linear algebra versus NumPy:- Operation Distinction: NumPy uses distinct operators:
*for element-wise multiplication (Hadamard product) and@ornp.dot()for linear algebra’s matrix multiplication. - Matrix Multiplication (
@,np.dot): Both linear algebra and NumPy require matching inner dimensions (e.g., (m x n) @ (n x p)). Order is crucial (AB ≠ BA). NumPy flexibly treats 1D arrays as appropriate row or column vectors to satisfy this rule.1 - Element-wise Multiplication (
*):- In linear algebra (Hadamard product), this strictly requires matrices to have the exact same shape.2 Order doesn’t matter.
- In NumPy,
*performs this element-wise operation. If shapes differ but are compatible, NumPy uses broadcasting to virtually align them without needing identical shapes.3 Order doesn’t matter.
- Broadcasting: This is a NumPy mechanism specifically for element-wise operations (
*,+, etc.) to handle arrays of compatible but different shapes efficiently, without making data copies. It does not apply to matrix multiplication (@). - Shape Flexibility: Linear algebra has rigid shape requirements (identical for element-wise, matching inner dimensions for matrix multiplication). NumPy is more flexible primarily due to broadcasting for element-wise operations and its adaptive handling of 1D arrays in matrix multiplication.
Memory Layout: The memory layout of an ndarray object doesn’t affect the overall sum calculation. However, summing over rows and columns is faster with C-ordered arrays. Specifically, summing over rows is relatively faster than summing over columns in C-ordered arrays, while the opposite is true for F-ordered arrays.
NumPy is the preferred Python package for numerical computing, offering the efficient ndarray class and vectorized operations that minimize slow Python loops. These techniques are also applicable to pandas DataFrames.
Conclusion
NumPy is the preferred Python package for numerical computing, offering the efficient ndarray class and vectorized operations that minimize slow Python loops. These techniques are also applicable to pandas DataFrames.
Chap5 pandas
#code #pandas
![[Pasted image 20250411211554.png]]
Denote DataFrame as df for the following context.
df is the core of pandas.
Key Takeaways:
- pandas is to manage indexed and labeled data.
- Data can be in basic
TYPESandndarrays. - Data can be organized in cols with names.
- Index can be in numbers,
str,datetime. - Vectorization is almost, if not always, faster than loops.
- Apply vectorization explicitly is almost, if not always, faster than apply vectorization implicitly.
- pandas support some intuitive data manipulations, such as appending a col to the existing df(
df['a_new_col']=(<new_col_value1>,<new_col_value2>)) - It’s always a good practice to explicitly assign an
indexvalue when appending new data. - Good practice to instantiate a data frame object:
- instantiate a
np.arrray - instantiate
dataframefromnp.array - then assign column names like
df.columns = ['col1', 'col2', ...]
- instantiate a
- For financial research purposes, time is of the essence.
- In general the scalar are applied to arrays and
dataframeobjects element-wise. There are specific methods to apply vector manipulations builtin. pandasprovides a wrapper aroundmatplotlibspecifically forDataFrame.
In [**6**]: %time df.apply(**lambda** x: x ** 2)
CPU times: user 1.05 ms, sys: 35 μs, total: 1.08 ms
Wall time: 1.09 ms
Out[**6**]:
numbers
a 100
b 400
c 900
d 1600
In [**7**]: %time df ** 2
CPU times: user 350 μs, sys: 22 μs, total: 372 μs
Wall time: 379 μs
Out[**7**]:
numbers
a 100
b 400
c 900
d 1600
#syntax Core methods:
import pandas as pddf = pd.DataFrame()df.indexdf.columns = ['col1', 'col2', ...]df.locdf.ilocdf.sumdf.applydf /+/-/*// scalardf['index']['index2']df.concatdf.appenddf.meandf.stddf.valuesdf.date_rangenp.array(df)df.infodf.describedf.sumdf.meandf.cumsumnp.mean(df)np.log(df)np.sqrt(abs(df))
![[Pasted image 20250414112934.png]] ![[Pasted image 20250414113505.png]] ![[Pasted image 20250414114632.png]]
The Series Class
A series object is a single column of data from within a DataFrame object.
Key takeaways
#code #syntax #pandas
- Instantiation:
s = pd.Series(np.array); s = df['col'] - Basic
DataFramemethods apply to Series objects as well. - comparison/logical operators can be applied to
DataFrame. - Which enables data selection by complex condition, like:
df['col'] <>&|returnsboolsdf[df['col'] <>&|]ordf.query('criteria_that_returns_bool')returns values that match criteria- Very important:
df.query() df.appenddf.concatdf.join(concat details with the same index values, defined by the how flags:left(append the 2nd df to the 1st),'right'(appending the 1st df to the 2nd),'inner'(finding least common factor),'outer'(finding greatest common multiple))df.merge(df1,df2,on='somthing_common')
Groupby
- Instantiation:
groups = df.groupby('col') - Selection of data:
groups = df.groupby(['col1', 'col2', ...])
Performance
TL;DR: #KIM
- Working with the columns (Series objects) directly is the fastest approach. By “directly”, meaning no callables, no iterables, just plain
df['col']or whatever.np.ndarrayis faster thanpd.DataFrame- calling
applyusinglambdafunctions or other things looping over all data entries in adfis almost ALWAYS the SLOWEST approach.
Conclusion
Pandas is a central tool in the PyData ecosystem, offering the DataFrame class for efficient tabular data manipulation. It supports vectorized operations for concise, high-performance code and provides robust handling of incomplete datasets. Pandas will be extensively utilized in subsequent chapters, introducing additional features as needed.
Chap6: OOP
TL;DR
#KIM
- OOP is the go-to approach in finance.
- OOP is suited for abstracted problems, like finance. Which is more intuitive for brain. More formatted, human-readable, less complex…
- GPT summarized: Object-Oriented Programming (OOP) aligns with natural human thinking, reduces complexity, and enables modular, abstract, and reusable code. It supports features like inheritance, encapsulation, polymorphism, aggregation, and composition—enhancing flexibility, maintainability, and user interface design. OOP is also the dominant paradigm in Python, promoting nonredundant, efficient software development.
- All objects in Python has attributes, methods, etc.
Glossary: #KIM
- Class: A group of objects or designs.(Human)
- Object: An instance of a class.(Juan Pan)
- Attribute: A feature of the class.(Juan has dark eyes)
- Method: A function of the class.
- Parameters: Input for the method.
- Instantiation: the process of trading a specific object based on an abstract class.
E.g.:
class HumanBeing(object):
def __init__(self, first_name, eye_color):
self.first_name = first_name
self.eye_color = eye_color
self.position = 0
def walk_steps(self, steps):
self.position += steps
Syntax
#syntax #code Attributes are like variables for different scopes.
![[Pasted image 20250414233922.png]] ![[Pasted image 20250414234022.png]]
Python Data Model(Very Important)
The Python Data model allows one to design classes that consistently interact with basic language constructs of Python, including:
- Iteration
- collection handling
- attribute access
- operator overloading
- function and method invocation
- object creation and destruction
- string representation
- managed contexts
#syntax #code Special methods(attributes with leading __ are private parameters that can not be access by the user):
__init____repr____add____mul____bool____len____getitem____iter__: returns an iterable
Conclusion
#code #syntax ![[Pasted image 20250414235631.png]] This chapter introduces object-oriented programming (OOP) in Python, highlighting its theoretical foundations and practical applications. OOP enables the modeling of complex systems through custom objects that integrate seamlessly with Python’s flexible data model. While some critique OOP, it offers powerful tools for managing complexity and abstraction. The derivatives pricing package discussed in Part V exemplifies a scenario where OOP is the most suitable paradigm to address intricate requirements.
Part 3 Getting Start with Quant
Index:
- Chap7: plotting
- chap8: using pandas handle time-series data.
- chap9: I/O right and fast
- chap 10: code performance
- chap 11: math
- chap 12: implement methods from stochastics.
- chap 13: statistical and machine learning approach.
Chap 7: Visualization
Tools:
matplotlibploty
KIM:
matplotlibcan parsendarrayobjects(to a point.)- Use styles.
![[Pasted image 20250415135738.png]] #syntax Switches and readability:
plt.xlim(min of x axis)plt.ylim(min of y axis)plt.titleplt.xlabelplt.ylabel- pass color as an argument:
plt.plot('color_code') plt.legend()
![[Pasted image 20250415140051.png]] ![[Pasted image 20250415140159.png]] ![[Pasted image 20250415140637.png]]
Going 2D & 3D
#KIM Be mindful of:
- Scaling
- 1st Approach: use two y-axes(left/right, share the same x-axis)
- 2nd approach: use two subplots(upper/lower, left/right)
- Separate styles
matplotlibcan parse sub-datasets, but to a point.- Sometimes, visualize in different ways simultaneously is necessary.
- Line and point plots are the most important ones in finance.
- Scatter plots can be used to compare the returns of two assets.
- Important: Histogram can be used to visualize financial returns.
- 3D plots can be used to visualize volatility surfaces.
Interactive
Dependent modules:
plotlycufflinks- Use styles and plotting types.
#syntax #code
cf’s useful methods: **add_annotations****add_atr****add_bollinger_bands****add_cci****add_dm****add_ema****add_macd****add_ptps****add_resistance****add_rsi****add_shapes****add_sma****add_support****add_trendline****add_volume**iplot
Conclusion
Good practices:
- Always consult the gallery #matplotlib first, then start with the example code.
A lot of #resources:
- http://matplotlib.org
- http://matplotlib.org/gallery.html
- http://matplotlib.org/users/pyplot_tutorial.html
- http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html
- http://plot.ly
- https://plot.ly/python/getting-started/
- https://github.com/santosjorge/cufflinks
Chap 8: Financial Time Series
#KIM :
- Financial time series data is one of the most important types of data in finance.
- Time, time, time.
But first, DATA.
#KIM :
- Thomson Reuters (TR) Eikon Data API
- Good #practices:
- First, import data from csv.
- Take a first look at the data(inspecting / visualizing)
data.head()data.tail()data.plot()
- Have some basic statistics, check data validity, etc.:
data.info()data.describe()data.mean()data.aggregate(min, max)df.roudn()np.meannp.stdnp.median
- Changes over time:
data.diff()[.head()]: absolute changes in value.data.pct_change()[.round(<decimal_places>)[.head]]: Usually percentage changes are preferred.rets = np.log(data/data.shift(1)): Log returns are preferred as well.- #KIM : Check log returns before any analysis happens.
- Resampling:
- Downsampling: Tick data resampled to 1minute intervals OR Daily’s intervals to Monthly’s.
- Rolling statistics A.K.A financial indicators #Extremely_important for technical analysis.
data.rolling[.min]data.rolling[.max]data.rolling[.std]data.rolling[.median]data.ewm[.mean]- custom indicators using
.applymethod.
- SMA strat: long when shorter-term SMA is above the longer-term SMA and vice versa, meaning:
- Like a flip switch, trades only take place when the two SMA lines intersect.
- Only a few trades will happen over the years.
- SMAs are used to derive positions to implement a trading strat, it’s a means to an end.
- Correlation: S&P v.s. VIX as an example: Choose plots with different scalings when dealing with this kind of problem(Correlation)
- #KIM Correlation is NEVER Causation.
- OLS Regression(Linear Regression):
reg = np.polyfit(rets['.SPX'], rets['.VIX'], deg=1)ax = rets.plot(kind='scatter', x='.SPX', y='.VIX', figsize=(10, 6))ax.plot(rets['.SPX'], np.polyval(reg, rets['.SPX']), 'r', lw=2);pd.DataFrame.corr: Compute pairwise correlation of columns, excluding NA/null values.
This chapter introduces financial time series—crucial datasets in finance—and highlights how the pandas library facilitates their analysis. Pandas offers efficient tools for analyzing, visualizing, importing, and exporting time series data across various formats. These capabilities are further demonstrated in the following chapter.
Chap 9
#context:
- I/O is OFTEN, if not always the bottleneck of data analysis.
- Data has to be read and performed in memory, results have to be written to disk.
- Analytic data less than 1GB is a sweet spot for Python.
- Use
picklepackage. picklefollows FIFO principle, which is difficult for human to read. So store it indictwith some keys.pickleis essentially a third-party package. If version changes compatibility issue may rise. So, consider the built-in R/W ofnumpyandpandas.pandascan read from a lot of formats. ![[Pasted image 20250416165202.png]]
Basics
pickle.dump()pickle.load()
Good practices #KIM :
- Use
numpy’s built-insave/load, it’s faster than SQL orpickle. - use
PyTablesand useh5sSTRUCTURE. - The package name is PyTables, the import name is
tables. - Use as few as
forloops as possible(a last resort) pandasandPyTables suffices the performance needs for SQL-like querying.- Use compression
PyTablesprovides never hurts. - Always use
HDF5withpandas. - More conveniently and specifically for finance, use
TsTablescreated by Yeves(Or self-build based on it.)
Ranking STRUCTURES by I/O performance(top is the fastest):
h5snp- SQL
pd.to_csvpd.to_excel
Resources
Conclusion
While relational (SQL) databases handle complex data relationships effectively, array-based approaches using tools like NumPy native I/O, PyTables, or Pandas with HDF5 often provide significant performance advantages for finance and science applications dealing with array-centric data. TsTables is specifically highlighted for large time series datasets, particularly in write-once, read-many scenarios.
Regarding hardware, the text advises caution against automatically choosing cloud-based scale-out solutions. It suggests evaluating whether fewer, more powerful “scale-up” servers (with many cores, large memory, potentially GPUs/TPUs) might offer comparable or superior performance and cost-efficiency for specific analytics workloads, citing a Microsoft study.
Ultimately, the recommendation is to first thoroughly analyze the specific data analytics tasks required, and then make an informed decision on the optimal hardware (scale-up vs. scale-out) and software architecture, as these choices significantly impact performance.
Chap 10: All About Performance
#KIM Good Practices:
- Use Vectorization(
numpy) - Compile to binary
- Dynamically(
numba,numbawins overall.) - Statically(
cython)
- Dynamically(
- Multi-threading: Used upon different problems of the same type.
Pros and cons of each #KIM :
numpyuses vectorization, which considerably improves speed over standard Python, but might use more RAM.numbaworks like a charm, but to a very specific use case.cythonuses less RAM, very fast, but is essentiallyc+python, which needs more effort to mod the code intoc-like language.- Recursive algos has problems to recalculate the previous problem each time. Use a cache decorator can dramatically improve performance.
- Always keep in mind of the
stack overflow,TYPESin Python has bit limitations.
Prime numbers
Prime number algo is an important benchmark as well as encryption.
Fibonacci
A typical recursive problem.
- Use iterative approach.
- Use cache.
- (Optional) Use Cython
int128TYPE.
The number Pi
Use Monte Carlo Simulation to calculate the PI.
#KIM :
- Randomness consumes up a lot of RAM.
- The methods for these algorithms work the similarly, if not the same as in the financial context.
Binomial Trees
I don’t get it.
Monte Carlo Simulation
#KIM :
- MCS is an important numeric tool in finance.
- Many alogs can be benefited from multiprocessing. MCS is a good case.
- Usually, regular Python is fine. But in production, always apply the BEST solution, even though it means more efforts to be put.
Conclusion
Okay, considering the text on this Thursday afternoon here in Kunming:
Here’s a concise summary of the provided conclusion:
The Python ecosystem offers several accessible ways to significantly improve code performance. Key approaches include:
- Using efficient Python idioms and paradigms, like vectorization, which often leads to more concise and faster code (though sometimes uses more memory).
- Leveraging specialized high-performance packages such as NumPy and Pandas for array and DataFrame operations.
- Compiling Python code using tools like Numba (dynamic compilation) or Cython (static compilation), particularly effective for financial algorithms.
- Parallelizing code execution, commonly achieved using the multiprocessing package on a single machine, with further options available for cluster computing.
A major advantage highlighted is that these performance techniques are generally easy to implement using existing libraries, often representing readily achievable gains (“low-hanging fruit”).
Resources
Chap 11 On math
The function used in this chap:
$$ f(x) = sin(x) + \frac12 x $$ Code:
def f(x):
return np.sin(x) + 0.5 * x
Regression approach
Regression is basically use a bunch of basis functions then calculate the best parameters for these functions to approximate the example function.
Code:
import numpy as np
def f(x):
return np.sin(x) + 0.5 * x
res = np.polyfit(x, f(x), deg=1, full=True) # deg=1 means linear.
def create_plot(x, y, styles, labels, axlabels):
plt.figure(figsize=(10, 6))
for i in range(len(x)):
plt.plot(x[i], y[i], styles[i], label=labels[i])
plt.xlabel(axlabels[0])
plt.ylabel(axlabels[1])
plt.legend(loc=0)
#KIM :
- You can approximate by adjusting
degrees. - Also by adjusting basis functions.
- Regression can cope well with noises in the data.
- Regression can also cope with unsorted data.
- Regression works with N-Dimensions without any dramatic change in code.
- Implementation is easy.
Interpolation
Basically is to find spline(a best fitting curve with continuous derivatives) across datas.
#KIM :
- Spline interpolation is often used in finance.
- Limited to low dimension problems.
- Require sorted data.
Code:
import scipy.interpolate as spi
x = np.linspace(-2 * np.pi, 2 * np.pi, 25)
![[Pasted image 20250417135209.png]]
Convex optimization
#KIM :
- Convex is important.
- Generally, find global optimization before locals(local minima can be multiple and the algo can be trapped in a minima.)
- Crucial #Extremely_important : To be in the know which optimization for what problem.
Global optimization by Brute Force
Adjusting step size can considerably help improve the accuracy of the result.
E.g.: let fm be a 2d function: #syntax
def fm(p):
x, y = p
return (np.sin(x) + 0.05 * x ** 2
+ np.sin(y) + 0.05 * y ** 2)
sco.brute(fo, ((-10, 10.1, 5), (-10, 10.1, 5)), finish=None) # step size is 5..
sco.brute(fo, ((-10, 10.1, 0.1), (-10, 10.1, 0.1)), finish=None) # step size is 0.1.
Local optimization
Code: #syntax
sco.fmin(fo, opt1, xtol=0.001, ftol=0.001, maxiter=15, maxfun=20)# function to be minimized, starting parameter values, input parameter tolerance, function value tolerance, max number of iterations, function calls.
Constrained optimization
Code #syntax :
cons = ({'type': 'ineq','fun': lambda p: 100 - p[0] * 10 - p[1] * 10})# constraint
bnds = ((0, 1000), (0, 1000))# bounds
result = sco.minimize(Eu, [5, 5], method='SLSQP',bounds=bnds, constraints=cons)# Eu is the function to be optimized.
Integration
- Applies the most to valuation and pricing.
Code for plotting:
fig, ax = plt.subplots(figsize=(10, 6))
plt.plot(x, y, 'b', linewidth=2)
plt.ylim(bottom=0)
Ix = np.linspace(a, b)
Iy = f(Ix)
verts = [(a, 0)] + list(zip(Ix, Iy)) + [(b, 0)]
poly = Polygon(verts, facecolor='0.7', edgecolor='0.5')
ax.add_patch(poly)
plt.text(0.75 * (a + b), 1.5, r"$\int_a^b f(x)dx$",
horizontalalignment='center', fontsize=20)
plt.figtext(0.9, 0.075, '$x$')
plt.figtext(0.075, 0.9, '$f(x)$')
ax.set_xticks((a, b))
ax.set_xticklabels(('$a$', '$b$'))
ax.set_yticks([f(a), f(b)]);
Code for computing:
sci.fixed_quad(f, a, b)[0]
sci.quad(f, a, b)[0]
sci.romb(f)
xi = np.linspace(0.5, 9.5, 25
sci.trapzoid(f(xi), xi)
sci.simpson(f(xi), xi)
By simulation:
for i in range(1, 20):
np.random.seed(1000)
x = np.random.random(i * 10) * (b - a) + a
print(np.mean(f(x)) * (b - a))
Symbolic Computation
#KIM :
- use SymPy
- SymPy auto simplifies math expression.
- SymPy has 3 kinds of engine:
- Latex
- Unicode
- Ascii
- Can prettify print math expressions.
- Valuable to financial math.
SymPy basics #syntax :
import sympy as sy
x = sy.Symbol('x')
y = sy.Symbol('y')
sy.sqrt(x)
f = x ** 2 + 3 + 0.5 * x ** 2 + 3 / 2
sy.simplify(f)
Equations, Integration and Differentiation
- use
sy.solveto solve simple equations.
E.g. solving integration and differentiation:
![[Pasted image 20250417163707.png]] ![[Pasted image 20250417164050.png]]
Conclusion
This chapter introduces four key mathematical topics and tools relevant to finance:
- Function Approximation: Important for applications like factor models, yield curve interpolation, and regression-based Monte Carlo methods for American options.
- Convex Optimization: Frequently used in finance for tasks such as calibrating option pricing models to market data or implied volatilities.
- Numerical Integration: Central to pricing options and derivatives, often involving calculating the discounted expected payoff under a risk-neutral measure (linking to stochastic process simulation covered in Chapter 12).
- Symbolic Computation (using SymPy): Highlighted as a potentially useful and efficient tool for specific mathematical operations like symbolic integration, differentiation, and solving equations.
Resources
Chap12 Stochastics
#glossary Extremely simplified:
- Stochastic: a sequence of random variables, wherein a draw is dependent on the previous draw.
- Markov property: tomorrow’s value of the process only depends on today’s state.
#KIM :
- MCS is among #Extremely_important THE MOST IMPORTANT numerical techniques in finance.
- Choose wisely the
TYPES,STRUTURESas well as algos to tackle different type of problems. - Important for valuation.
- Important for risk management.
Random numbers
Code:
import numpy.random as npr
npr.seed(100)
npr.rand(10)
![[Pasted image 20250417172252.png]] ![[Pasted image 20250417172449.png]] ![[Pasted image 20250417172514.png]]
BSM model
![[Pasted image 20250417173531.png]]
GBM
![[Pasted image 20250417175731.png]] ![[Pasted image 20250417175812.png]]
Square-root diffusion
![[Pasted image 20250417180115.png]]
Heston volatility
![[Pasted image 20250417220914.png]]
Get dizzy about models?
Here’s to demystify all these models:
jump Diffusion
#book Book page 369
![[Pasted image 20250417223558.png]]
You don’t have to understand them all at first sight
- Scan through first.
- Refer to book Chap 12 repeatedly.
- Learning by practicing.
VaR
A risk measure widely adopted among industries and practitioners.
CVaR and CVA
Other risk measures.
Conclusion
This chapter focused on Monte Carlo simulation techniques for finance. It explained how to generate pseudo-random numbers according to different distributions and how to simulate the random variables and stochastic processes crucial in financial modeling.
Two major application areas were detailed:
- Valuing both European and American options.
- Estimating risk measures like Value-at-Risk (VaR) and Credit Valuation Adjustments (CVA).
The text concludes that Python combined with NumPy is well-suited for these often computationally demanding simulations. This effectiveness stems from NumPy’s C-based implementation providing significant speed advantages over pure Python, and its support for vectorized operations leading to more compact and readable code.
Chap 13 Statistics
#KIM #Extremely_important :
- Portfolio theory: When the returns are normally distributed, the best portfolio composition only relate to:
- 1. mean return;
- 2. variance of the returns;
- 3. covariance of the returns.
- Capital asset pricing model: When the returns are normally distributed(an observed conclusion, returns of an individual asset is to be aligned with Gaussian Distribution), the price of a single stock is in linear relationship with the market index.
- Efficient market: Prices fluctuates randomly and returns are normally distributed.
- Option pricing theory: BSM model with geometric Brownian motion.
Normality tests
Benchmarks:
- Skewness test
- Kurtosis test
- Normality test
#KIM :
- Real world data often, if not always render fat tails(Jump diffusion).
MPT(Modern Portfolio Theory)
#KIM :
- MPT is a cornerstone in economics.
- MPT’s important factors are:
- Assumes normal distribution(of a single asset.)
- Assumes Mean and Variance are the only necessary and sufficient statistics.
- Goal of MPT is to maximize the return, while minimizing the risk.
- Covariance Matrix is central when selecting asset objects.
- The theory does not allow shorts. Only longs, and all the longs add-up to 100%.
- Mean-variance portfolio selection:
- Expected portfolio variances
- Expected portfolio return formula(Utility Index): $$ \mu=E(\Sigma_{I}w_ir_i) $$
- #KIM Sharpe ratio formula:
$$ SR=\frac{\mu_p(-r_f)}{\sigma_p} $$
- $\mu_p$ is utility index.
- $r_f$ risk-free rate.
- $\sigma_p$ standard deviation.
- #KIM: In a dollar-neutral portfolio, $\mu_p = Excess Return + r_f$, essentially renders the factor to just the $\mu_p$, which is the return given by stock position. The same works for long-only strategy.
- #KIM : Generally, $r_f$ is usually 0 as long as there is no financing cost.
- #KIM you can either fix a target return level and find minimal volatility or set a minimal volatility and find max return level and all the optimal portfolios comprise an efficient frontier $$ Sharpe Ratio = \frac{(Money the Investment Made - Money the Piggy Bank Made)}{(How Much the Investment Bounced Around)} $$
Key Takeaways
- most models, if not all the finish models like MPT or CAPM respond the assumption that returns of securities are normally distributed
- Test for normality is important
- when the stock returns are normally distributed, optimum portal choice can be casted into a setting where mean return variance of the returns coparents between different stocks are relevant for an optimal portfolio composition
- when stock returns are normally distributed prices of single stocks is in linear relationship to a market index.
- [[Compounds, integration, log-returns]]
Bayesian Statistics
What it is
In essence: Bayesian statistics is to update our prior probability (initial belief) about something into a posterior probability (updated belief) after considering what actually happened (the data).
Implementation
Requires pymc package.
Why Random Walk
Refer to: This note.
Machine Learning
Unsupervised
- k-means (These data BELONG to this subset.)
- Gaussian Mixture (These data are xx% likely belong to this subset.)
Codifying: a general way:
- Import model class
- Instantiating model object
- Fit model to data
- Predict the outcome(Clusters)
Supervised
Focuses:
- Classification problem
- Estimation problem Algos:
- Gaussian Naive Bayes
- Logistic Regression
- Decision Trees
- Deep Neural Networks
- Support Vector Machines
#KIM Refer to This to learn more.
Part 4 Algo Trading
Chap14 API
Generalization of workflow:
- Get Started with Accounts and APIs
- Data Retrival:
- Tick Data
- Candles Data
- Historical Data
- Streaming Data
- Placing Orders
- Buy
- Sell
- Manage Account
- Get balance
- Get Margin
- …
Chap 15 Trading Strats(Algo Trading)
- Backtesting
- Buy-and-Hold Benchmark
- RWH and EMH Benchmark
- Train/Test Splits(Avoid overfitting)
- Sequential
- Randomized
- Strats
- SMA(Simple Moving Average)
- Regression Methods: OLS Regression
- ML Methods
- Classification
- Log Regression
- Gaussian Naive Bayes
- Support Vector Machines(SVMs)
- Clustering
- k-means cluster
- DNNs
- Classification
- Frequency Approach
Chap 16 Automate
Get hands-on with APIs and some basic operations:
- Retrieve data
- Historical
- Streaming
- Place orders
- Buys
- Sells
- Check account status #KIM :
- Vectorized backtesting only tests to a point.
Kelly Criterion
Finding the best fraction/leverage for the capital to be traded with.
To be more dovish, go for Half Kelly. As Full Kelly will aim for a very high(but de facto “Best”) leverage and induce more risk.
More #KIM to be found on my notion notes about principles, KIMs, and experiences.
The optimal fraction/leverage:
$$ f^*=\frac{\mu-r}{\sigma^2} $$ $f$: The optimal fraction/leverage. $\mu$: Expected return of the stock. $\sigma$: The standard deviation of returns(volatility). $r$: Constant short rate, default to $0$.
Risk Analysis
- Drawdown
- VAR(Value-at-Risk)
Refer to my notion notes: