What I Learned in Two Years’ Tech Work in Finance

It’s been two years since I started working full time as a software engineer. As I accumulated experience, I became a lot more hesitant to write, because I start to feel that I can’t contribute anything new on top of what everyone else already knows. And I even felt bad for having written some of the old posts, since they now seem quite silly and naive.

In some sense those feelings must reflect a lot of truth, but that still shouldn’t stop me from writing. Perfect is the enemy of good, and if I wait until I know everything, I’ll never write again; hence this post. Random thoughts will be laid out in no particular order.


One mistake that I’ve repeated is to optimize prematurely. As a recent college grad having done a lot of brain teasers, it’s really tempting to work clever algorithms into the job. But a lot of the times, it’s just unnecessary. In coding competitions, we are only rewarded for writing correct, fast and short programs. Nothing else mattered. In professional work, we need to add a few more terms to the equation – cost of human effort (to write, test and review the code), flexibility for future modifications, and simplicity of the solution. A simple solution that gets 90% of the cases right is perhaps even better than a complicated solution that gets 99.99%, if humans can much more easily understand failure modes and be able to manually fix things in the former case. After all, the alternative is to spend a lot of time debugging when the one edge case happens, and breaks the system.

I think this is important enough to deserve an emphasis – simplicity is valuable. A prediction system that gives you a slightly inaccurate number in a predictable way is much better than one that gives you a more accurate magic number that no one understands. In financial markets, complexity creeps in wherever competition is fierce, but the simplicity of many models would probably still be surprising to outsiders (hint: it’s not all machine learning).

There’s another form of short-sighted cleverness, which is to tweak the system very slightly to achieve what I want without learning about the whole system or understanding the full consequences of those tweaks. The smallest diff isn’t necessarily the best fix. Adding a patch that isn’t well thought out and that doesn’t fit in with the rest of the code is just incurring tech debt. Perhaps we could call this under-engineering.

Don’t Pretend You Understand

One thing that is common, perhaps more so among newer folks, is to pretend to understand, whether listening to a conversation or getting answers from teammates. Having been on both sides, I believe that this is a really bad habit. Of course it is only human nature to hide your inexperience, and I’ve also heard criticism about people asking for help before spending a lot of time on the issues. But no, I think a lot of the times, asking questions eagerly is way more productive over all, especially for new teammates.

I say this multiple times to my interns: if a problem can possibly take you half an hour to figure out, and I already know the answer, then the decision here is between one minute of my time versus thirty minutes of yours. Is my time 30x more valuable? I wish! If it’s a tough question and the mentor doesn’t already know the answer, then it seems even sillier for the intern to struggle alone for a long time.

There are times when I’m answering questions from newer folks, and I know that there’s no way they understood some statements I made. However they were still nodding and reacting as if they did. Invariably they will return in a few days with the same questions. This is counter productive.

This problem is less common, although perhaps more serious for more experienced people, since the pride has built up. I’ve told myself “this is something I should know by now” multiple times and stopped myself from asking my coworkers. But the truth is no one knows every corner of the system, and everyone knows that.

There is no shame in not knowing things. People expect that. Just ask.

Aligned Incentives

One idiosyncrasy about the finance industry is that a lot of compensation comes in the form of variable and unpredictable year end bonuses, as compared to stock offerings in tech companies. Of course people like certainty. But I think there’s a case for an opaque process of reward in the form of bonuses.

From first principle, employees have different incentives, and they usually aren’t the same as the company’s goals. Whenever incentives diverge or even conflict, we can get serious issues.

There are countless examples in real life. One example in finance is the reward curve for some hedge funds. Hedge funds are roughly companies that take investor money and help them pick investments. Some hedge funds collect a fixed fee, plus a significant fraction in additional return. That means when the investments increase in value, they make more money. That’s good incentive, right? The problem is when the investments lose money, they aren’t affected – they still take the fixed fee, regardless of how much was lost. (This is not entirely accurate, since they will lose clients if they keep losing money.) Therefore the funds will tend to make riskier investments. If your investment is going to make on average 10% a year, you’d rather make 100% this year and lose ~80% next year, as you can collect a much larger fee. This is worse for the clients.

Now let’s look at employee compensation. We want to reward employees in a way that encourages them to help the company make more money (assuming making money is the goal of the company). One thing we can do is to measure the amount of contributions for everyone, and reward accordingly. For example, we could measure hours spent in the office, or number of lines of code written, or survey people for their estimations of their teammates’ contributions, etc. The problem is these are only proxy measurements, and once you start measuring them, people will optimize for the proxies instead of the actual goal. If you measure lines of code written, you’ll encourage verbose and redundant code; if you measure hours in the office, people will stay longer but not necessarily work at the same speed, and so on.

But the problem is fundamental – you can’t measure the actual contribution and hard work, and by measuring proxies you’ll encourage cynical behavior. A fix, if not a complete solution, is through obfuscating the reward function. If I tell you that I’ll give you an unknown amount of money by the end of the year based on How Well You DidTM, and let you fill in the rest, then you won’t be encouraged to write bad code, or only focus on projects that had Impact, or other things bad for the firm.

I feel that this has worked quite well in my company. But there are a lot of assumptions for this to work. One is that employees have to be OK with not knowing how much they’ll make. There also needs to be a lot of trust between employees and managers, so that employees can trust that they’ll be evaluated fairly by the end of the year.

And More…

This post is getting long and messy, so maybe I’ll call it a day for now. There are a lot of smaller lessons that come from trading and recruiting. Trading is arguably the best arena to hone one’s rational decision making skills, and interesting stories come up every now and then. Maybe I’ll write a follow up some day.

Thoughts on Fooled by Randomness

Just finished Nassim Nicholas Taleb’s well-known book, Fooled by Randomness. Here are some brief thoughts, in no particular order.

The Birthday Irony

Despite the author’s years working in trading and writing a book on probability, in one of the few cases where he did actual math, he did it wrong. Here’s the original:

If you meet someone randomly, there is a one in 365.25 chance of your sharing their birthday, and a considerably smaller one of having the exact birthday of the same year.

Nassim Nicholas Taleb, Fooled by Randomness

It seems like he was trying to say – on average, there are 365.25 days a year (first order approximation of leap years), so you have a \frac{1}{365.25} chance of meeting someone of the same birthday.

If you do the math though, here’s the actual probability: every four years (365 \times 4+1 = 1461 days), there are 1460 days in which your probability of sharing a birthday is \frac{4}{1461}, and 1 day in which it is \frac{1}{1461}. So, the probability is \frac{1460}{1461} \times \frac{4}{1461} + \frac{1}{1461} \times \frac{1}{1461} \approx \frac{1}{365.44}. That’s significantly off from 365.25 that you can’t really say “I just made a first order approximation”.

To fully understand this error, let’s say there is one extra day in n years, instead of 4. Then the number, instead of 365.25 or 365.44, will be (\frac{365n^2}{(365n+1)^2} + \frac{1}{(365n+1)^2})^{-1}. After taking Taylor series expansions, we get 365 + \frac{2}{n} - \frac{364}{365n^2} + O(n^{-3}), or 365 + \frac{2}{n} + O(n^{-2}), instead of the 365 + \frac{1}{n} that the author had guessed.

Let’s spend a little time to gain intuitions on why it’s 365 + \frac{2}{n} instead of \frac{1}{n}. Consider Alice and Bob, and a year is exactly 365 days. Then the chance of sharing a birthday is 1 in 365. Now say we add x days to Bob’s calendar only, so Bob’s birthday has 365+x possible choices while Alice still has 365. Then, the probability that they have the same birthday is 1 in 365+x. At this point, it is clear that if we add x days to Alice’s calendar, the chance of sharing a birthday goes down, therefore we know that the author’s estimate of probability is too high. Then, add x to Alice’s calendar. If x is small, we can ignore the probability that their shared birthday is on one of the days in x (that probability is second order). Then, approximately we have the probability of sharing a birthday as \frac{365}{(365+x)^2}, which is close to \frac{1}{365 + 2x}, again ignoring the second order term. Substituting x for \frac{1}{n}, we have arrived at the desired result. The factor 2 comes from the fact that we added a leap day not only to Bob, but also to Alice.

Anyway, on a higher level, the lesson is that you should fully justify your simplifying assumptions, instead of jumping to conclusions.

Wittgenstein’s Ruler

This idea has never explicitly come to my mind, so I thought it was interesting. It says something like if you don’t have a reliable ruler, and you use it against a table, you might be measuring your ruler with the table. One example he mentioned was that some people in finance claimed that a ten sigma event happened. Using the principle – if you measured a ten sigma event, your ruler (mathematical model) is probably seriously flawed.

One takeaway from this is that statistics is merely a language to simplify and describe the real world, the world does not run according to the rules. It would be ridiculous to plot data points under a bell shape, and say that the world is wrong when the new data point doesn’t fit under it.

Another way of saying the same thing is conditional probability. Relevant xkcd: https://www.xkcd.com/1132/

One way I’ve seen it in real life is the current political situation in Hong Kong. Say there’s a certain probability that one citizen goes nuts and riot in the street, and there’s a certain probability that the government has done something terribly wrong. If you have very few people rioting, then the ruler tells you that those guys are probably at fault. But if you have a majority of citizens supporting the riots or rioting, then those guys become the ruler, and you’re measuring the government.

Think about All Possibilities

One very valid point in the book is that you should think of the world as taking one sample path in infinitely many possibilities. When you evaluate an outcome, you should think of all the things that could have happened. For example, if your friend did a thing and made a huge success, it doesn’t mean he made a good decision or that you should’ve done the same, or even that you should follow suit. We have only one data point, you don’t know what the probability distribution looks like. Maybe he could have lost it all. When you think about all that could have happened, you will have less jealousy to the lucky and more sympathy to the unfortunate.

Happiness is Relative

This is a tangential point to randomness, but still important to keep in mind. Given that you have basic human needs fulfilled, your happiness often doesn’t depend on how much you have, but how much more you have compared to those around you. More generally, it’s not the absolute well-being that matters, but the changes. So to be happy, don’t be the medium fish in the big pond, go to the small pond and be a king. If you start out at the top, tough luck, because chances are your status will revert to mean over time.

Limit Your Loss

If there’s one actionable item from the book, that’s to always remember to limit your worst case scenario. Between a steady increase in personal well-being with no risk of going bankrupt and more income but also a chance of losing everything, you should prefer the former, because eventually the unfortunate thing will happen. That’s called the ergodicity – any event with a nonzero probability will eventually happen, mathematically.

The Author’s Conspicuous Faults

I believe most readers will often find the author’s comments controversial and provocative, if not arrogant and overgeneralizing. There’s a bunch of stuff he said that is just plain wrong.

He said in the beginning of the book that he didn’t rewrite according to his editor’s suggestions, because he didn’t want to hide his personal shortcomings. But the point of a nonfiction book that is non-autobiographical is not to convey who you are, but to give readers inspirations and positive influence. If you say a bad thing in the book that you believe in, you’re not “being true”, you’re bad influence! I don’t know what exactly he was referring to, but I suspect they should include my following points.

He’s exceptionally arrogant, way off the charts. You’ll see him saying things like “I know nothing about this, despite having read a lot into it” and “I know nothing, but I am the person that knows the most about knowing nothing”. He just couldn’t write one sentence that ends in a defeated tone. Before he puts a period down, he must add another clause to the sentence to remind the readers that he’s just being humble, he didn’t mean it. It’s quite funny when you look for it.

He also loves stereotyping people to the extreme. He would say things like “journalists are born to be fooled by randomness”, “MBAs don’t know what they’re doing”, “company executives don’t have visible skills” and “economists don’t understand this whatever concept”. One thing he said in the beginning of the book was that he didn’t need data to back up his claims, because he’s only doing “thought experiments”. I think he mistook that for “unfounded personal opinions”. When you make claims about journalists and economists being dumb, that’s hardly a thought experiment. You absolutely need to back up your claims.

Overall, this book has some good ideas, but not that many. If you already have a decent background in math, maybe you can skip this book without harm.

Catenary Inversion: Curves of Sagrada Familia

Sagrada Familia is stunning and beautiful. If you ever go visit, don’t miss out on the bottom level: there are exhibitions about the constructions and history of this masterpiece by Gaudi. When I visited a while ago, I was surprised to find a model that explained how the curves of the arches of the church were designed, and it was really cool.

To start out, imagine you’re building the roof of a house. Usually they are like this: /\. Modern houses also look like this: Π. The point is, a flat roof is hard to support, so the older houses are all angled. If you hold a dumbbell horizontally to your side, you’ll feel tired a lot faster than holding it angled upwards. This is because materials in general are a lot better at handling compression than bending forces. By holding your arm at an angle, you are supporting part of the weight by compressing your arm along its own direction, reducing the amount of force perpendicular to that direction. Back to the roof: a flat one is fine if made by concrete and steel, but if we use a long piece of wood, maybe not.

Anyway, using the same materials, an arch shaped building will last much longer than a flat topped one, simply because the bricks are subject to bending forces to a less extent. The problem then becomes: how can we find the shape that minimizes bending force at every point on the arch (to zero, actually)?

If you remember high school physics, we can dive into it. Say we draw an arch like this: ∩, and we pick any brick on the arch (say on the left half). Let’s pretend this is the ideal curve, such that there is no bending force anywhere. This little brick you picked is going to have a tiny bit of mass, and the slope of the arch changes a little bit before and after it. Then we have three forces acting on this dot: gravity which points down, force from the left brick supporting it, and force to the right brick. The latter two forces have slightly different slopes, and the three add up to 0 (otherwise the arch will collapse. Note that we don’t have forces perpendicular to the arch between bricks, which is the whole point.) Oh no, we have a differential equation! It’s been 2 days since I took that exam, I have forgotten everything! What do I do?

The Catenary Curve

Consider this seemingly unrelated physical problem: given a string with multiple beads on it on regular intervals. Now you hold the two ends loosely such that it forms a U shape. What properties does this shape satisfy? Similarly, consider a single bead (again, on the left side, but as they say: WLOG) there are three forces acting on this bead: gravity pulling it down, force from the left bead pulling it up, and force to the right bead. You probably saw this coming: these three forces are same as those we just talked about on bricks, but pointing to exactly the opposite directions! (I am too lazy to draw a picture, you can try to get the idea). What’s more, these three forces also add up to 0, since the beads are not moving. This means that we can take the shape of the string, put it upside down, and make a perfect arch out of it. To see why this is true, imagine you draw the perfect string curve on paper, drawing out the forces. Now you rotate the paper 180 degrees and negate all three forces on the beads. (1) The three forces still sum to 0; (2) gravity still points down with the right amount; (3) the remaining two forces still balance out with the forces on the bricks nearby, since those are also negated. Hence, this curve satisfies all of our requirements. We have found the answer! If I recall correctly, architects actually used this method to draw the curves for blueprints. This curve of beads on a string is called the catenary curve, and the solution is in the form of the hyperbolic cosine, which is essentially a sum of two exponential functions, having equal but opposite signs of exponents.

As a personal anecdote, a physics professor of mine once called a problem “physically solved” after he wrote out the equations which uniquely determine the answer, because the rest can be solved by mathematics, either analytically or numerically. This can lead to very uninteresting tests, for example simply writing out the Maxwell equations for every single EM problem. In our case, the arch problem is not only “physically solved” because we can derive the curve using the same differential equation as catenary curves, but also that we can “physically solve” it using beads, a string and actual physics.

Stars Falling From the Sky (and how to capture them)

This post is about how I made this photo.

Things I used

Hardware: Pixel 2 XL, Ubuntu desktop

Software: Camera FV-5, Snapseed (both free from Play Store), python (with libraries of course)

Taking the pictures

Go to somewhere dark where you can see stars obviously, bring a tripod, point your phone’s camera at the stars. Open the Camera FV-5 app, set ISO to 3200 (max), shutter speed to 4.0″ (max). Shooting utilities -> Intervalometer -> mode: Interval + shooting duration, Every 4 seconds, 20:00 shooting time. Press Start Now and wait for 20 minutes…

One thing to note though, is that there is something I don’t understand in this process. I always end up getting half of the number of photos I’m supposed to get. And between each successive photo, there seems to be a time gap. I suspect the reason to be the processing time of the camera, but it could also be the case that I don’t know what I’m doing.


I’ve tried to find other people’s tools before, but they mostly didn’t work. Well there is only one solution.

import glob
import argparse
import cv2
import numpy as np
from tqdm import tqdm

parser = argparse.ArgumentParser()
parser.add_argument('reversed', help='reverse default image order')
parser.add_argument('attenuation', type=float, help='dimming images')

args = parser.parse_args()

images = sorted(glob.glob(args.img_dir + '*.png'))
n = len(images)
print('{} images found.'.format(n))

if (args.reversed == 'True'):

buf = cv2.imread(images[0]).astype(np.float64)

for i in tqdm(range(n - 1)):
    new_image = cv2.imread(images[i + 1]).astype(np.float64)
    buf = np.maximum(buf * args.attenuation, new_image)

cv2.imwrite(args.output_file, buf)


pip install whatever you don’t have (cv2 is installed as “pip install opencv-python”). The idea of image stacking is very simple. In the most basic case, you only want to take the maximum brightness of every pixel and every color channel. In this case, once the star has brightened up a pixel, it will stay bright in the final picture. However to achieve the “falling stars” effect, you need to dim the brightness of stars in their high positions. Therefore there is an “attenuation” parameter in the above code. In this case, I used 0.99 to produce the final image.

Before stacking, the photos always look very noisy. But it’s ok because the noise will be averaged out in the stacking process.

Fine tuning

My current favorite way to tune photos is the Snapseed app. Send back the photo to your phone, open it up in Snapseed. I usually play with the Curves, White Balance, Crop, Selective and Vignette. Obviously if you are slightly less amateur, you would use a more professional software. But I find this app sufficient for what I do.

If You Really Want A Professional Photo Though

You obviously need a legit camera and a nice view on the ground to complement the sky. And probably use Photoshop for post processing as well. But it’s ok you can still use this python script to stack photos.