MBA candidate · Software engineer

Anish Thakker

Six small projects I built to understand how markets actually work, using quantitative and behavioral finance principles. Built from scratch, tested honestly, free to run.

Before business school I was a software engineer at Amazon and VMware, building backend systems where reproducibility and correctness were the core of the work. I carried that same discipline into Wharton, where I have spent my first year studying how markets are priced and, in particular, where they are priced inefficiently.

These six projects draw on both halves of that background. Each is inspired by concepts from my Behavioral Finance coursework at Wharton and built with the engineering rigor I developed writing production software. Each one takes an idea from the literature, implements it from the ground up, and tests whether it survives once you account for transaction costs and the temptation to look ahead in time. A consistent theme runs through the set: the places where markets misprice assets tend to be the places where investors behave in predictably irrational ways, and that is where the behavioral research and the data line up most clearly.

Everything here is downloadable and runs end to end with a single command. Select a project above to read the write-up and download the code.

Now
MBA candidate, The Wharton School
Before
Software Engineer · Amazon, VMware
Focus
Hedge funds · AI × quantitative finance
01 — Derivatives

Options Pricing & the Volatility Surface

Price an option three different ways, then read the market's fear off the shape of the curve.

Download .zip ZIP · 528 KB · Python

New here? Read the README inside the zip for full setup and method notes.

ATM IV
Description
A from-scratch options library. Closed-form Black–Scholes with analytic Greeks, binomial and trinomial trees, and a Monte Carlo engine for payoffs that have no clean formula. On top of the pricers, an implied-volatility solver that fits a full surface across strikes and maturities from a real option chain.
Deliverable
A Python package with the three pricing methods, the surface fitter, and a set of plots: the 3D implied-vol surface, the per-maturity smile, and a convergence check that shows the tree and Monte Carlo prices collapsing onto Black–Scholes for vanilla options.
How to use
Unzip the folder and install the dependencies with pip install -r requirements.txt. Run python run.py to rebuild everything from scratch; the pricers, the fitted volatility surface, and the plots all regenerate into the results/ folder. The README has the method notes and the validation tests if you want to go deeper.
Inspiration
Black–Scholes assumes one constant volatility. Real markets don't agree with that for a second, and the way they disagree (the smile, the skew) is one of the most picked-over objects in all of finance. I wanted to build the machinery well enough to actually see that shape in live data instead of just reading about it.
Why it matters
Pricing and Greeks sit underneath every derivatives desk and risk system there is. Building all three methods and watching them agree to the penny is the cleanest way I know to trust that the foundation is solid before anything gets built on top of it.
Behavioral angle
The skew (out-of-the-money puts trading richer than a symmetric model says they should) is partly a behavioral fingerprint. Investors overweight the odds of a crash relative to how often crashes actually happen, which is the tail-overweighting that prospect theory predicts. You can read collective fear straight off the curve.
02 — Statistical arbitrage

Cointegration Pairs Trading

Find two stocks that wander apart, then bet they snap back together.

Download .zip ZIP · 1.6 MB · Python

New here? Read the README inside the zip for full setup and method notes.

+2σ −2σ
Description
A complete pairs-trading pipeline. It screens a universe for cointegrated pairs, models the spread between them as a mean-reverting process, tracks the hedge ratio with a Kalman filter so it adapts over time, and trades when the spread stretches too far from equilibrium.
Deliverable
An engine that screens and ranks pairs with the Engle–Granger test and reports their spread half-lives, a Kalman-filtered spread model, and a backtest that forms pairs in-sample and trades them out-of-sample. It returns Sharpe, drawdown, turnover, and per-pair P&L.
How to use
Unzip it and install the requirements with pip install -r requirements.txt, then run python run.py. That runs the full loop: it screens the universe for cointegrated pairs, fits the spreads, and writes the backtest metrics along with the spread and equity-curve plots into results/. The README explains the trading rules and the in-sample versus out-of-sample split.
Inspiration
Two companies in the same business should move together. When they don't, either something real changed or somebody overreacted. Classic stat-arb is a bet on the second case. I wanted to build the full screen-to-backtest loop rather than rehearse the textbook two-stock example one more time.
Why it matters
It's a whole research workflow in miniature: hypothesis, statistical test, model, and an honest out-of-sample evaluation. The discipline of keeping the pairs you pick separate from the period you test on is the part most write-ups quietly skip, and it's the part that decides whether a strategy is real or imaginary.
Behavioral angle
Mean reversion in a spread is overreaction correcting itself. When one name sells off harder than the news justifies, the gap that opens is the market's overshoot, and the trade is a bet that it closes. That's De Bondt and Thaler's overreaction hypothesis playing out at the level of a single pair.
03 — Alternative data

A News-Sentiment Signal

Turn a headline into a number, then ask whether it tells you anything about tomorrow.

Download .zip ZIP · 232 KB · Python

New here? Read the README inside the zip for full setup and method notes.

t=0
Description
A sentiment signal built from news headlines. It scores each headline with a finance-specific lexicon, aggregates to a daily reading per stock, and tests whether that reading has any predictive relationship with next-day returns.
Deliverable
A pipeline that runs from a bundled headline dataset to a tradable signal, reporting an information coefficient, a signal-decay curve, and an event study around the days with the strongest sentiment. Timestamps are aligned so the signal only ever sees news that was public before the trade.
How to use
Unzip, install with pip install -r requirements.txt, and run python run.py. It scores the bundled headlines, builds the signal, and drops the information-coefficient plot, the event study, and the equity curve into results/. The README explains the timestamp alignment and how to point it at your own headline data.
Inspiration
Everyone repeats that "the market reacts to news." I wanted to pin down whether a crude, transparent sentiment score actually carries information, or whether it evaporates the moment you account for cost and timing. Using a plain lexicon instead of a black-box model kept the answer readable.
Why it matters
Most of the alpha people claim from text data dies on the alignment problem, which is a polite name for accidentally trading on tomorrow's news. Forcing strict timestamps and building the event study around them is the real point of the project. It's a study in not fooling yourself.
Behavioral angle
News-driven trading is soaked in attention and overreaction effects. Barber and Odean showed retail investors crowd into attention-grabbing stocks, and the drift after a big news day is the market under-reacting and then over-reacting in sequence. The signal is an attempt to put a number on that reflex.
04 — Asset allocation

A Regime-Switching Allocator

Let the data decide when the market's mood has flipped, then move with it.

Download .zip ZIP · 324 KB · Python

New here? Read the README inside the zip for full setup and method notes.

calm turbulent
Description
A two-state hidden Markov model fit to market returns and volatility. It infers whether the market is in a calm regime or a turbulent one and shifts a portfolio between stocks and bonds accordingly.
Deliverable
A regime detector, a backtest of the regime-driven allocation against a static 60/40 split, and a price chart shaded by whichever regime the model believed it was in at the time. The inference is strictly causal: it only uses information available up to that day.
How to use
Unzip the folder, run pip install -r requirements.txt, then python run.py. The model fits, infers the regimes causally, and writes the shaded regime chart and the performance comparison against a 60/40 split into results/. The README has the notes on why the inference stays strictly past-only.
Inspiration
Markets clearly have moods. Volatility clusters, and calm and panic don't arrive at random. I wanted to see whether a simple statistical model could catch the switch early enough to be useful, without cheating by looking at the whole history at once.
Why it matters
The tempting bug here is to fit the regimes on the full dataset and then "discover" you could have sidestepped every crash. Making the model decide in real time, with only the past in hand, is the line between a backtest and a fantasy. That constraint was the actual engineering problem worth solving.
Behavioral angle
Regimes are partly a story about fear. The turbulent state lines up with flight-to-safety and the volatility clustering that comes from investors herding into and out of risk together. The model is really putting a probability on collective panic.
05 — Portfolio construction

A Black–Litterman Engine

Blend the market's implied view with your own, then watch overconfidence wreck the result.

Download .zip ZIP · 148 KB · Python

New here? Read the README inside the zip for full setup and method notes.

equilibrium → posterior
Description
An implementation of the Black–Litterman model. It starts from the returns the market's own weights imply, lets you layer specific views on top with a stated confidence in each, and produces a blended set of expected returns to optimize against.
Deliverable
A working engine over a handful of asset-class ETFs, plus a demonstration that sweeps the confidence on a view and shows how the resulting portfolio shifts, including the case where overstated confidence drives the allocation to absurd extremes.
How to use
Unzip, install the requirements with pip install -r requirements.txt, and run python run.py. It builds the equilibrium and posterior portfolios and generates the weight-comparison and confidence-sensitivity plots in results/. The README walks through how to set your own views and confidence levels.
Inspiration
Naive mean-variance optimization is famously fragile. Nudge your return estimates a little and the portfolio swings wildly. Black–Litterman is the elegant fix, and I built it partly to understand why anchoring to market equilibrium tames that instability so well.
Why it matters
This is the model that gets opinion and market consensus to play nicely together, which is most of what active allocation actually is. Writing the math out made the role of the confidence parameter concrete in a way that reading about it never managed.
Behavioral angle
The confidence term is a direct dial for overconfidence. Tell the model you're more certain of a view than you have any right to be, and it pours the whole portfolio into that one bet. The confidence sweep is a small, self-contained picture of how miscalibrated conviction blows a portfolio up.
06 — Cross-sectional anomaly

The Lottery Effect

Go long the boring stocks and short the ones that look like lottery tickets.

Download .zip ZIP · 3.6 MB · Python

New here? Read the README inside the zip for full setup and method notes.

low MAX high MAX
Description
A cross-sectional strategy built on a single signal: each stock's largest daily returns over the past month. It goes long the stocks with the tamest recent peaks and short the ones with the most extreme, rebalancing monthly.
Deliverable
The full backtest, from signal construction through quintile sorts to a long-short portfolio, with a Fama–French regression to check whether the returns survive once you control for the usual factors.
How to use
Unzip it, run pip install -r requirements.txt, then python run.py. That builds the MAX signal, runs the quintile sorts and the long-short backtest, and writes the factor regression and the plots into results/. The README covers the signal construction and the Fama-French controls.
Inspiration
There's a robust, slightly uncomfortable finding that stocks with lottery-like payoffs are systematically overpriced. People pay up for the small chance of a jackpot, and that demand pushes expected returns down. I wanted to reproduce it with my own hands rather than take it on faith.
Why it matters
It's one of the cleanest cases of a behavioral bias leaving a fingerprint you can actually trade. The strategy is almost embarrassingly simple, which is exactly the point. The edge comes from the psychology, not from any clever engineering.
Behavioral angle
This is the lottery-preference effect, the probability-weighting half of prospect theory made tradable. Investors overweight tiny chances of a big gain, overpay for the stocks that dangle them, and earn lower returns for it. Long boring, short flashy harvests the spread (Bali, Cakici, and Whitelaw).