Stars Falling From the Sky (and how to capture them)

This post is about how I made this photo.

Things I used

Hardware: Pixel 2 XL, Ubuntu desktop

Software: Camera FV-5, Snapseed (both free from Play Store), python (with libraries of course)

Taking the pictures

Go to somewhere dark where you can see stars obviously, bring a tripod, point your phone’s camera at the stars. Open the Camera FV-5 app, set ISO to 3200 (max), shutter speed to 4.0″ (max). Shooting utilities -> Intervalometer -> mode: Interval + shooting duration, Every 4 seconds, 20:00 shooting time. Press Start Now and wait for 20 minutes…

One thing to note though, is that there is something I don’t understand in this process. I always end up getting half of the number of photos I’m supposed to get. And between each successive photo, there seems to be a time gap. I suspect the reason to be the processing time of the camera, but it could also be the case that I don’t know what I’m doing.

Stacking

I’ve tried to find other people’s tools before, but they mostly didn’t work. Well there is only one solution.


import glob
import argparse
import cv2
import numpy as np
from tqdm import tqdm

parser = argparse.ArgumentParser()
parser.add_argument('img_dir')
parser.add_argument('output_file')
parser.add_argument('reversed', help='reverse default image order')
parser.add_argument('attenuation', type=float, help='dimming images')

args = parser.parse_args()

images = sorted(glob.glob(args.img_dir + '*.png'))
n = len(images)
print('{} images found.'.format(n))

if (args.reversed == 'True'):
    images.reverse()

buf = cv2.imread(images[0]).astype(np.float64)

for i in tqdm(range(n - 1)):
    new_image = cv2.imread(images[i + 1]).astype(np.float64)
    buf = np.maximum(buf * args.attenuation, new_image)

cv2.imwrite(args.output_file, buf)

print('Finished.')

pip install whatever you don’t have (cv2 is installed as “pip install opencv-python”). The idea of image stacking is very simple. In the most basic case, you only want to take the maximum brightness of every pixel and every color channel. In this case, once the star has brightened up a pixel, it will stay bright in the final picture. However to achieve the “falling stars” effect, you need to dim the brightness of stars in their high positions. Therefore there is an “attenuation” parameter in the above code. In this case, I used 0.99 to produce the final image.

Before stacking, the photos always look very noisy. But it’s ok because the noise will be averaged out in the stacking process.

Fine tuning

My current favorite way to tune photos is the Snapseed app. Send back the photo to your phone, open it up in Snapseed. I usually play with the Curves, White Balance, Crop, Selective and Vignette. Obviously if you are slightly less amateur, you would use a more professional software. But I find this app sufficient for what I do.

If You Really Want A Professional Photo Though

You obviously need a legit camera and a nice view on the ground to complement the sky. And probably use Photoshop for post processing as well. But it’s ok you can still use this python script to stack photos.

Tinkering with Deep Learning Style Transfer

A while ago, Prisma was quite popular on social media and everyone was filtering pictures with its artistic filters. Got some free time yesterday, so I thought I should try out some neural network style transfer apps.

A few words about what deep learning style transfer does. Two pictures for input: one style image and one content image. The style image is supposed to be an artistic piece with a distinct look, and the content image is normally a photo you take. The algorithm will then produce an output image that uses the artistic style from the style image to draw the objects shown in the content image. You will see some examples below.

First I went to Google’s Deep Dream. The content image:

contentimage

This is a picture I took in New York, and you probably recognize it as I used this picture as the banner of this blog.

And the style image:

5-centimeters-per-second-14515

This is an artwork from Shinkai Makoto’s movie 5 Centimeters Per Second.

Alright, so I uploaded to deep dream and this is what I got:

dream_f9cc0d2ae9

That’s pretty cool actually. You can see the color palette is transferred over accurately, and the objects are clearly visible. Here are a few pluses:

  1. Colors are preserved well from the style image. There was a lot of yellow in content, but none in style, so it was entirely removed.
  2. The most visibly styled objects are humans. You can see the clothes turned into a gradient, and the shapes getting abstracted a little bit.
  3. Generation was relatively fast; I waited for a few minutes only. It was also free so I wouldn’t complain.

However there are two things I was not satisfied with:

  1. The resolution is pretty low and it looks quite compressed. The pictures I uploaded were HD, and I got this tiny picture with 693×520.
  2. There are many visible artifacts in the sky. It is understandable since there were clouds in the content image, and a lot of electric cables in the style image. It looks like the training was ended prematurely.

Therefore, I decided to pull the source code and run it for myself.

First attempt

First I Googled style transfer deep learning, and found the link to this Github repo. I’m running it on a Mac, and the installation instructions were quite clear. With all default settings, I got these results:

out_200

out_400

out_700

out

These four images are produced sequentially. As you can see the quality got better and better over time. There are no more line shaped artifacts in the sky, but you can still see a few red and green dots close to the skyline. Over all it looks like the one generated by deep dream, but I like the blue cloud in the center more.

However, these pictures are even smaller! The images were only 512 pixels wide, and it took my Mac 2 hours already. It’s sort of my fault for running deep neural nets on a Macbook Pro laptop without a GPU. But I really want to generate larger and clearer pictures, and if with 512 it’s already taking so long, generating a four times larger picture is going to take much much longer. So I Googled again for a faster solution.

Second attempt

With some more Googling I found this repo. It is written by the same guy working in the Stanford lab, Justin Johnson. It is more painful to install all the dependencies, and I had to modify some code for it to compile, but eventually I got it to work somehow. The read me file claims that the generation is hundreds of times faster and supports near real time video style transfer, so it should be good. Some results:

out1024_mosaic

out1024_scream

out1024_starry

These pictures are styled with the pre-trained models, and even with a width of 1024, they are generated almost in real time. These models are styled with Starry Night, The Scream and a window mosaic art respectively. They are actually very lovely! You can see the brush strokes are vivid, and the images are of such high quality.

But where’s my Shinkai Makoto?

It turns out that if we already have a model trained with a styled image, then generating with a content image is very fast. But we need to train a model each time we have a new style image, and that I assume is what takes more time. Unfortunately I didn’t do it, for reasons explained below.

Sacrificing my CPU for art and science (and reading papers in the meantime)

Since I really want to make this one good picture, I am going back to the original code. Not only does generating a large picture take a lot of CPU time, it also takes a lot of space; I had to delete some huge software that I never used to make up enough space for it. It looks like the program is going to run for approximately a day. Meanwhile I should read the papers behind the above codes, and maybe study the code a little bit.

Here’s the paper for the original algorithm by Gatys, Ecker and Bethge, and here’s the website for the faster code by Johnson. In my understanding (which could very well be wrong), here are the TL;DRs:

Original paper:

  1. The outline of the algorithm is to extract the styles and content of an image separately, then start over with a random noise picture, modifying the pixels slightly over many times until it matches both in style and content of our desired picture.
  2. We already know how to extract content of a picture before the publication of this paper. There is a free trained model online called the VGG network, which is basically a computational graph with a fixed structure and fixed parameters, that is known for identifying objects as well as humans do.
  3. The way VGG works, or any other convolutional neural network, is like the following. On each layer, we have an input vector of numbers. We carry out certain mathematical operations to these numbers. We multiply some of them together, add them, add a constant, scale them, tanh them, take the max of them… all sorts of math, and generate a new vector of numbers. A deep neural network has many layers, one layer’s output feeding into the next layer’s input. If you just do random math, then the generated numbers will be meaningless. But a “trained” network like VGG will produce a vector of numbers that are meaningful. Maybe the 130th number in the output vector indicates how likely this picture has a cat, that kind of thing. Rumor has it that the field of computer vision is started primarily to deal with cat pics.
  4. A convolutional neural network is just a neural network with a special set of mathematical operations that are designed to capture the information in a picture, as it employs a hierarchical structure of calculations that preserves the 2D structure of pixels.
  5. The key breakthrough of this paper: activations from features represent the objects, we already know that. But if we look at different “features” of the network and take the correlation matrix of the output signals across features of a certain layer, we have obtained style information.
  6. So step one: run the content picture through VGG and capture the output signals of a certain layer. Step two: run the style picture through VGG and capture the correlation matrix of output signals from features of a certain layer. Step three: start with a random noise picture, run through VGG and capture the content and style information just as above. Step four: compare our random picture with our captured signals, and figure out how to change these random pixels a little bit so that the style matches with the style picture and the content matches with the content picture. Step five: go back to step three until your computer crashes and burns. Step six: output the “random picture” – it’s already not random anymore!

As you can imagine, changing the pixels a little bit at a time to make it look like something eventually is definitely not going to be fast. That is very understandable.

Faster code:

  1. The original paper framed the problem as an optimization problem, meaning that we have a function f(x), and we want to find the x that maximizes or minimizes f(x). This is true if we think of the output picture to be x and the combined difference between x and our desired picture in terms of style and content as f(x). f(x) is indeed our loss function, and we are trying to minimize it. The style and content images are hidden in the loss function.
  2. This new paper, however, frames the problem as a transformation problem. This means we have an input x, and we want to calculate y = g(x). This is actually very natural to think about, because we have 2 input pictures, and we want 1 output picture, so x could be the style and content images, y our generated picture, and g(x) will be our algorithm.
  3. Finding an unknown function is basically what machine learning does best: first make a really dumb robot, then tell it x0 (some input), it spins out some random crap y0*. You say no, no, no, bad boy; you should say y0. It’s really dumb so it only remembers a little bit. Then you move on to the next input x1 and so on, until the robot learns some patterns from your supervision and starts to make some sense. So one way to solve the transformation problem of style transfer could be something like this: collect many style pictures and content pictures, and run through the slow code to generate pictures. Then make a dumb robot and teach it the corresponding input output pairs, until the robot can do it by itself.
  4. All of the above was prior knowledge to the paper, and this approach has a great advantage over the old one: it is very fast and simple to generate a new picture now. We don’t have to guess anymore; just throw the pictures at the robot and it will instantly give a new one back to you. The downside of this approach, of course, is that getting the robot in the first place can be very expensive; you need to generate many thousands of pictures through slow code before you can generate one picture through the robot.
  5. The key insight of this paper is more of a subtle and technical one: when we teach the robot how to turn x into y, we don’t just compare the robot’s output to a picture we want, but instead we run the output image through the VGG network to extract the style and content, then we use the style and content differences to teach the robot how to do better. Teaching the robot has a formal name called “back propagation” because of how it is practically done. This approach gives higher quality pictures.

Although training can be more expensive, generating new pictures can be real time now. This is great for commercialization. Let’s say a company trains many models based on some distinctive artistic styles, then when users upload a picture, they can get instant artistic filters provided by the company. That’s basically what Prisma does, I suppose. Yet for my purpose, it will not be any faster than the optimization approach.

There are some exciting new developments by Google as well. It builds on top of Johnson’s work, and allows interpolation between styles, so you can mix Van Gogh with Monet, for example. It came out just a month ago! Since they also released the code, I’m going to try it out a little bit. Here’s a quick Monet style:

stylized_0_1000_1_000_2_000_3_000_4_000_5_000_6_000_7_000_8_000_9_000

It’s alright, doesn’t look too great. Probably Monet’s brushes are too small, so this big picture looks just textured instead of styled. Unfortunately, training a new model takes YUGE space, like 500GB. YUGE. This is why the transformation approach is not suitable for a random individual like myself: training a model is very demanding in resources, and the benefits don’t outweigh the costs. Even more sadly, attempting to run this crashed my computer and I have to restart my 1024-Shinkai Makoto picture after running for 18 hours.

Anyway, done with reading papers, I’m just going to sit here and wait for results. After about a day of computation:

ultimate

…I should really get myself a GPU.

EE354 Project: Minesweeper in Verilog

Finished a project a few days ago for EE354 at USC: Introduction to Digital Circuits. The final project for the class requires us to write a program in verilog to implement whatever we want, as far as it’s not too trivial. So I took a few days and finished this minesweeper game. The source code, with the compiled programming file, can be downloaded here. It’s not too difficult; in case you are going to work on one as well, here are a few remarks.

Hardware/Software Specifications

This project is created for the Nexys 3 board using Xilinx ISE 14.7, and programming is done using the software Diligent Adept. If you want to try it out, you can program the bit file: ms_top.bit onto your board. To play, connect the board to a VGA monitor. Control is done with the 5 buttons on the board and a switch, and obviously you look at the monitor to play. Additional debugging output on the LEDs.

How To Play

Up, down, left, right buttons to move your current selected grid, and press the center button to click on the grid. There is no support for flagging a mine, because there are only 5 buttons and I don’t wanna complicate things further. Keep the switch 0 turned off. On the left is a game board with 16 x 16 grids and 40 hidden mines, on the right you can see a smiley face to indicate the status of the game: playing, lost or won, just like the game on a Windows machine. There is also a timer that counts in seconds.

Remarks

Compiling the project takes about ten minutes each time, so during that time I usually play the old version “to find bugs”.

To display the digits, I created a 30 x 30 bitmap for each digit, as well as the explosive. That takes a while.

The smiley face, however, is done by equations (or inequalities) and not bitmaps, since the bitmaps would be huge otherwise.

The VGA display code is modified from a code template. The timing generator should be the same across all Verilog VGA projects for this class of FPGA boards.

Once I encountered an error saying “your design does not fit in this device”, so I pulled a combinational block of code out of the display module and made a separate small module for it, and instantiated it in the display module. That worked, I have no idea why.

Randomization is done by first creating a pseudorandom sequence. Here I did a x1 = a * x0 + b approach to generate this sequence. Then each time I initialize 40 mines on the top left corner, and randomly swap the states of 2 grids 1000 times. That gives a satisfactory result.

Last but not least, the effect that opens up an entire connected empty area is done naively. In a normal CS approach, I would have created a BFS or DFS walk to flood the neighbors, but such a thing is more complicated in an EE approach, since instead of data structures like vectors or recursive functions, we only have wires, registers and modules. The way I did it was that during each clock, I would go through all the grids, and check whether the grid should be cleared based on its neighbors. The trick here is that this looping over all grids is done in parallel, so doing it once takes the same amount of time as doing 256 times.

Some other groups in the past have done some pretty awesome work, like one group did a maze that presented the walls and the ground in 3D perspectives, and another did a 2 player bomberman. If you try hard enough, this final project could be a lot of fun.

CS201 Project: Tank War

Last semester for CS201 at USC, my group wrote up a game project for class. Not exactly the most innovative or beautifully designed game ever, but the gameplay is pretty fun when it doesn’t lag.

Here’s a link to the executable JAR file: Download

 

Gameplay

First you register a name and put in a password, but make sure you have at least one number and one uppercase letter (don’t ask me why).

Then you either create a room or join a room. So yeah those who don’t have any friend can go home now.

Each person picks a color,

Basically you control a tank and shoot other guys, and they respawn after a while. Pretty typical stuff, but there are two different features:

  1. Bullets rebounce: bullets don’t die until the third time they bump into something;
  2. Tanks can move in all directions, instead of just wasd.

These features make the game quite different.

 

Development

My team has five people, including the following:

Yifan Meng

Anne Yu

Erdong Hu

Hexi Xiao

…and Me

We spent at least 2 weeks on this project, with at least one all-nighter. Ended up with a monster with like 6k lines of Java code.

The game server runs on the EC2 machine that powers this website, and is the lowest tier so please tolerate the lag. I’ll probably keep the server running until my credit card runs out of money.