Monday, June 29, 2020

Deep Chernoff Faces

One of my favorite¹ concepts for multi-dimensional data visualization is the Chernoff Face. The idea here is that for a dataset with many dependent variables, it is often difficult to immediately understand the influences one variable may have on another. However, humans are great at recognizing small differences in faces, so maybe we can leverage that!

Tired: traditional plotting

The example from Wikipedia plots the differences between a few judges on some rating dimensions:
which already improves on the "traditional" way to present such data, which is something similar to a line chart:

or a radar chart:


Clearly² the Chernoff faces are the best way to present this data, but they leave some of the gamut unexplored.

Wired: using GANs to synthesize faces

Over the past couple years, NVIDIA has worked on generating faces using the techniques from generative adversarial networks, and have published two papers on their results and improvements to their generation techniques. This is the neural network structure that is responsible for the system behind thispersondoesnotexist.com and myriad other clones such as thiswaifudoesnotexist.net, thisfursonadoesnotexist.com, thiscatdoesnotexist.com, thisrentaldoesnotexist.com, and the list goes on and on. The faces from these networks have even been used in international espionage to create fake social media profiles.

The GAN technique is pretty easily summarized. You set up two neural networks, a generator, which tries to generate realistic images, and a discriminator, which tries to distinguish between the output of the generator and a real corpus of images. By training the networks together, adding more layers, fooling around with hyperparameters and overall "the scientific process", you manage to get results where the generator network is able to fool the discriminator (and humans) with high-quality images of faces that do not actually map to anybody in the real world. StyleGAN improves on some of the basic structure here by using a novel generator architecture that stacks a bunch of layers together and ends up with an intermediate latent space that has "nice" properties.

Basically how it works.

A trained StyleGAN (1 or 2; the architecture for the dimensions does not change between versions), at the end of the day, takes a 512 element vector in the latent space "Z", then sends it through some nonsense fully-connected layers to form a "dlatent"³ vector of size 18x512. The 18 here indicates how many layers there are in the generator proper—the trained networks that NVIDIA provides produce 1024x1024 images; there are 2 layers for each of the dimensions from 2^2=4 to 2^10=1024.

What part of STACK MORE LAYERS did you not get?

The upshot is that the 18x512 space has nice linearity properties that will be useful to us when we want to generate photorealistic faces that only differ on one axis of importance. The authors of the NVIDIA paper call each of the 18 layers a "style", and the observation is that copying qualities from each style gives qualitatively different results.

The styles that correspond to coarse resolutions bring high-level aspects like pose, hair style, face shape; the styles that correspond to fine resolutions bring low-level aspects like color scheme. But they only roughly correspond to these, so it's going to be somewhat annoying to play around with the latent space! What we need is some way to automatically classify these images for the properties we want to use in our Chernoff faces...

Putting it all together

What I did was take an unofficial re-implementation of StyleGAN2 and run it to generate 4096 random images (corresponding actually to random seeds 0 - 4095 in the original repository). The reason why we're using an unofficial implementation is that the unofficial implementation ports everything to TensorFlow 2, which also enabled me to run it with (a) less GPU RAM (I only have a GTX 1080 at home) and (b) on the CPU if needed for a future project where I serve these images dynamically.

I was also too lazy to train a network to recognize any features, so I fed these images through Kairos's free trial, receiving API responses that roughly looked like this. (There's no code for this part, you can just do it with cURL).

Then, somewhere in this gigantic unorganized notebook that I need to clean up, I train some linear SVMs to split our sample of the latent space; eventually I will clean this up and make it so it's a little more automated than the crazy experimentation I was doing. After I train the SVMs and receive the normal vector from their separating hyperplanes, I manually explore the latent space by considering only one style's worth of elements from each of the normal vectors, plotting the results, and seeing what changes about the photos (which could be different than the feature from the API response!). This could probably be automated; I should likely use the conditional entropy of the Kairos API responses given the SVM result to rank which styles are most important and then automatically return that instead of manually pruning styles.

I ended up with seven usable properties plus one I threw out before I got tired of doing this by hand.
  • yaw
  • eye squint
  • age
  • smile
  • skin tone
  • gender
  • hair length
  • quality of photo
I decided not to use quality of photo (you can see the results in the notebook results) because I didn't want half the photos to just look terrible. The good news is that I made a new notebook for generating the actual faces, one that should actually work for you if you clone the repo.

After generating a few images and using the `montage` command from ImageMagick, we get our preliminary results!


Now that's the future!



¹ "Favorite" might be code for "useless," going with the theme of this blog.
² Clearly.
³ It's called a dlatent in the code, but the paper calls it the space W and the vectors w. I don't know.
 for q in `seq 0 4095`; do i=$(printf "%04d" $q); curl -d '{"image": "http://cantina.patrickxia.com/faces/seed'$i'.png"}' -H "app_id: xxx" -H "app_key: xxx" -H 'store_image: "false"' -H "Content-Type: application/json" http://api.kairos.com/detect > $i.json; sleep 1; done and something similar for the `media` API, which gives you landmarks. If you care enough, I've hosted the images here, which lets you just look through them if you want.

Monday, January 27, 2020

Interfacing with the real world

I moved to a new apartment building recently1, and have run into the problem of not having a dedicated package room to receive my Amazon packages.

The building is equipped with a call button system, but the buttons don't reliably signal the upstairs unit, and there's no guarantee somebody is at home to let the courier in. The Amazon courier typically attempts to call the phone number that's listed on the package to be let in, but not always. Sometimes they press the button. Mostly, all I get are notifications that look like this, which only get resolved after a bit of back-and-forth regarding "delivery instructions."



Internet of Crap

From the inside, the door control looks like this:


All I need is for some sort of system to automatically press the button that unlocks the door whenever somebody calls the number on the package. Easy enough, right? There are dozens of IoT robotic button pushers, all of which promise to do this sort of thing!

However, experience tells us that this glorious future is in fact entirely crappy. I had previously tried using a Switchmate to control my bedroom light. When pressing the physical button, it somehow only worked eight times out of ten. Even worse, when using the Wi-Fi feature, it worked less often; about six times out of ten. I hate the future so much. I was incredibly happy to dump that piece of junk on the "free table" at work so it could annoy somebody else instead.

So what's there to do? We can cobble our own unreliable IoT solution out of parts that we have lying around! The parts I ended up having lying around that ended up being useful are:

  • An old (original!) Raspberry Pi A (here's the A+, the oldest thing that I can find on Amazon, which features a smaller form factor and more pins on the header)
  • A random 802.11n wifi USB dongle
  • COTO 9007-05 Reed Relays

Past experience has indicated that relays are really the best way to go here instead of a transistor-based approach, which often ends up finicky and not pressing the button reliably enough. In fact, these reed relays were ordered "in anger" after incredible frustration with a prior project. Our old angry self comes to save the day for us!

Figuring out what to connect to what

Taking the door unit off the wall, we see:
which makes it clear that the function of the door button is to short the orange wire to the green wire (apologies for the bad picture -- the traces are easier to see in person).

All we need to do, therefore, is attach the actuation points of the relay across a GPIO pin on the Raspberry Pi and its ground -- that way, when the Raspberry Pi asserts that pin, we energize the relay and end up acting as the button.

I ended up doing this in two phases. The door unit is wired to the apartment complex with just standard cat5 cable. I cut up another cat5 cable and screwed the appropriate wires into the same terminals, then used a RJ45 coupler and soldered onto the other end of a new cable. That way, (a) I wasn't restricted to soldering in the same room as the door unit, and (b) I could detach my Raspberry Pi entirely from the door unit as needed.

Here's a picture of that hacked up solder job:


Note we are indeed shorting the green wire to the orange wire from the cat5 bundle. It doesn't help I used orange to connect the wires to the Raspberry Pi though. Here's everything shoved into a shoebox to protect from environmental damage. Super professional setup here.


Raspberry Pi server

We can write this in Python because this whole thing is a dirty hack.

All we need to do is combine http.server with RPi.GPIO. To stop the whole Internet from being able to actuate our door lock, we only respond to one particular URL, which is a UUID that I generated. You might think this is "security by obscurity." I prefer to say that "we have an authentication system that is backed by a securely generated bearer token."

Every time we get a GET on our HTTP server that matches that UUID, we toggle a GPIO pin high, hold it there for a while, then toggle it back low. We can deal with the concurrency problem like we do with any other problem: by pretending it doesn't exist. Luckily, http.server only handles one request at a time, so it is written with our design goals in mind. If we get two requests, one can wait.

Here's a gist of that first version of the code.

Phone interface 

Now that we have the Raspberry Pi capable of controlling the apartment lock, we need some way to be able to call it from any phone.

I used to use Tropo for all of my CPaaS (... which is apparently "Communications Platform as a Service"; I hate the future) needs because they offered free development accounts that let you experiment all you wanted as long as you were okay with no SLA. Unfortunately, they have since been acquihired by Cisco and their product is no more. It's a shame, because that API was pretty great -- you would get a no-nonsense REST call against your webserver whenever an incoming call came in, which is all we really wanted.

I tried out a couple demos, and Plivo offered me $10 of free credit so I thought that it'd be the best to just get started quickly. However, the console (a) kept bugging out on me and kicking me into creating a new incoming phone number, and (b) was super confusing about how to hook up all the parts together.

I think my original confusion is that all of the demo applications on Plivo are just a statically-hosted XML file, and I didn't think to change the endpoint to access my home web server instead. The fact that the workflow kept kicking me out to "Welcome to Plivo" because it was a demo account also did not help.

All roads in the console seemed to lead to hitting the "PHLO" button. This is apparently a flow chart-based version of "programming a phone system without knowing how to program," so I figured I'd try my hand at it. After struggling a bit, I managed to create this "program" (with secret tokens and phone numbers redacted):



There's actually a bug in this program, but I can't seem to fix it. Bonus points if you figure out what it is! Oh well, it sort of works, which is good enough for me.

Productionizing the solution

After having this system up for a few weeks, it worked "fine." However, there are a few issues with it. We will only fix some of them.
  1. Authentication between the CPaaS and the door opening system is done via a bearer token sent in cleartext.
  2. There is no authentication at all on the phone number side.
  3. The electronics are super janky. The relay is rated for 5 volts and we're actuating it with a 3.3V pin. The solder job is completely hacked together, though the shoebox does offer great protection against the environment.
  4. The system is open loop; if disconnected from the door, the calling user still gets a "success" message.
  5. There's no monitoring of uptime of the system.
We will not deal with deficiencies (1) and (2) at all. If we cared enough:
  • We can use TLS to encrypt the interaction between Plivo and us to prevent the bearer token from being stolen. Even better: Plivo cryptographically signs some headers to indicate that they are legitimately from Plivo, so we can authenticate the client without the added TLS layer and certificate headache. One wrinkle: the documentation and examples given by Plivo are insecure. They don't verify that any nonce is only used once, which opens up any system that uses their example code to replay attacks. This isn't just a problem with Plivo. Twilio doesn't even offer a nonce for this feature. Something something stupid future something.
  • We can demand that the users who call the number type a PIN in, and rotate the PIN per delivery. However, not many people actually randomly call the number, and the threat model isn't great. Just hitting the call buttons at random in an apartment building often gets you somebody who's willing to buzz you in if you claim to have a delivery.
We will deal with deficiency (3) at the same time as deficiency (4). 

Our current system has the great property that we have Galvanic Isolation from the apartment's door mechanism itself. This is another reason why using a simple transistor to allow current to flow through the door button terminals doesn't work well; you'd end up coupling the two systems if they share a common ground. We'd like to maintain the isolation, as it's also not great to couple your own electronics with shared infrastructure for the apartment complex.

What we need to be able to do is observe current going through the relay2. One way to do this while maintaining galvanic isolation is to use one of my favorite devices, the optoisolator. The optoisolator is a combination of an LED and a phototransistor in the same package. If current flows on the LED side, it lights up, and allows current to flow through the transistor. If no current flows, there's no LED, and the phototransistor blocks a current path. I ordered a few PS2501-1 optoisolators from eBay for this purpose.

Here's the updated circuit diagram. We've added an NPN transistor (I grabbed a 2N2222, but the characteristics aren't super important; the important thing is that we're driving the relay with more voltage than the 3.3V before). The 200 ohms for the resistor is chosen to limit flow to less than 16 mA from the pin.



Note that the optoisolator is in one package, although it's drawn as a separate LED and phototransistor here. GPIO14 is configured with a pull-up resistor, which means that if current is flowing through the LED, reading from that pin should indicate a "low" value, whereas if current is not flowing through the LED, the phototransistor blocks current, which means that we would see a "high" value.

Deficiency (5) is easily dealt with third-party tools. Since I already have a Google Cloud account for other miscellaneous things, we can use Uptime Checks within that cloud project to configure notifications whenever the system is down. Uptime Checks are available for free; Stackdriver logging is billed, but there's an always-free threshold of 50 GiB.

Wrapping it all up: the new code, including verifying that the door is actuated before returning and the `healthz` endpoint, is available in this gist.

Note that the healthz endpoint doesn't actually verify that the relay can be successfully energized; however, failures at that point are logged and notified from the CPaaS layer.

Here's an image of the finished hardware setup. I've put the components on some prototype board instead of letting them dangle.


Conclusions

After setting this system up, it still doesn't work right, because many couriers just refuse to read the delivery instructions before texting via the Amazon app. Argh! The next step is to put a little notice on the door to remind the couriers to check the delivery instructions.

The good news is that the system works well for guests; whenever people need access to get to our apartment, we can have them let themselves in without having to use the finicky door buzzer.



1Yeah, this blog post, like all posts, is super late.
2Actually, it might be better to see whether or not there's voltage across the door button terminals. Unfortunately, experiments indicated that enough current to light the LED inside the optoisolator would also trigger the door mechanism, so we have this solution instead: we can only observe the circuit when we are actively changing it. It still solves the open-loop problem.