Recruitment is hard.

Recruitment is one of those things that everyone does, but everyone seems to want to do better. It's also the kind of thing that is challenging: it's complicated, it's tiring, and it's personal.

Even if the {company, organization, student group} you're recruiting for has {well-defined, narrow} common goals or interests, the people you will want to recruit aren't likely to fit into any one "type." Primitive "typing" (computer pun not intended) can be based upon a two-axis plot of their interest and their compatibility.

The four basic types in recruitment based on compatibility and interest

To clarify, compatibility refers to their interest in what your organization does, while interest refers to how interested a person is in joining an organization surrounding those interests.

Last Thursday, I ran a discussion-based workshop on recruitment for SIPB, MIT's computing club. In the context of SIPB, "compatible'' refers to how interested in or knowledgeable about computing a given person is, and "interested'' refers to how interested they are in being a part of a computing organization. I introduced the types mentioned above as a foundation for what I hoped to be a very organic and free-flowing conversation about improving recruitment. Turned out my plot worked.

To start, we discussed some of the reasons that people walked into the SIPB office. They included:

  • wanting to learn about SIPB,
  • desiring use of a stapler, hole punch, or scanner,
  • attending a hackathon,
  • needing help to fix a computer problem,
  • hanging out with friends who spend time in the SIPB office,
  • tooling on a pset with SIPB members, and
  • proving to a reasonable portion of zephyr that you exist outside of the terminal.

We then talked briefly about how likely some of these people were to fall into the various types and emphasized how SIPB's recruitment efforts needed to be tailored not only to the different types of people but the different reasons they came into the office. It's a bit strange to spontaneously sell someone on the organization who's just using the stapler, but it's almost logical to inform them of some of the organization's other services.

We found that there was a central theme to what we wanted to accomplish with any interested or compatible person: we primarily wanted to get them to come back to the office. Ultimately, this accomplished more than just keeping someone informed through a set of mailing lists (which they could then filter to an ignorable mail folder): the returning prospective member gets more personalized attention and another opportunity to see the awesome things the office is up to.

We let the conversation flow pretty organically, and some of the other suggestions to reach out to prospective members and to get them involved in the organization included:

  • maintaining a running list of "smaller" tickets which don't require a lot of background knowledge but relate to SIPB projects,
  • emailing prospective members about parts of projects or new project development that have low barriers to entry,
  • updating the website to contain information relevant to prospective members in a clearer, more direct manner,
  • making a point to introduce yourself to an unfamiliar face that sits down next to you,
  • pointing people towards other members whose interests are better aligned to theirs, and
  • following up with someone you talked to for a bit but haven't seen in the office for a week or so.

My observations of the SIPB office over the course of the weekend indicated that some of these are already well on their way to implementation.

The final topic of conversation revolved around how to get people who might be interested in an organization like SIPB but have never set foot in the office to learn more about SIPB. Of course, we mentioned inviting computing types to hackathons and other events. I suggested that subtler methods could be even more effective; I know that I've gone to the SIPB office many times simply because someone asked me where I was headed after class and mentioned they were headed that way.

We certainly didn't touch on everything which could be done to improve recruitment for SIPB; after all, recruitment is not just a hard problem, but one that changes over time as an organization, its members, and its prospective members change. But I also never expected to hold this workshop just this once.

***

I didn't write up too many of the notes about the various types here because they were fairly specific, but I did address it briefly in the comments section of my old blog:

Paul: I liked the two-axis model a lot, and I was hoping you would have talked about it a bit more. To ask a very open-ended question: What should a group do about/with type III people (low interest, low compatibility)?

Me: I intentionally didn't speak too much about it directly because how you handle people in types II and IV - the two groups one usually needs to spend the most time thinking about with respect to recruitment efforts - varies significantly from organization to organization. In terms of type III, the main goal is to be informative. People in this group can move to type II or IV fairly easily. In terms of SIPB, this is about informing people about services like scripts, linerva, and XVM, or if it comes up, mentioning that Debathena is a joint effort between SIPB and IS&T. Hearing about how SIPB's services makes their lives easier might make them think of SIPB in a more positive light. They also might tell all their friends about the awesome services we provide - maybe some of their friends will want to get involved!

'First' thoughts on git

I suppose it's more than a slight bit incorrect to state that these are my first thoughts on git; I've certainly already been exposed to git in a variety of ways. I'd always been told that my love of graph theory would convert me over to this different type of version control.

I more or less decided to look into git on a set of whims, yet I was really persuaded to go to the "dark side" because I was strongly encouraged (read: required) to understand the back-end model instead of just memorize a handful of commands (like with SVN). I'd attempt to do the merits of the consequences of git's back-end model justice, but instead, I'll point you to a far more experienced git user's blog post.

My first steps to really learning git was to look at a handful of resources:

After an hour or so of reading, my friend Evan and I talked through a bunch of the basic commands briefly and some of the more interesting commands in greater depth. I took notes on easel paper for the basic commands, and we worked through diagrams for cherry-pick, merge, and rebase:

Notes and diagrams from our conversations on git commands

I got fairly excited when I guessed that rebase essentially applies a series of cherry-pick calls to a branch.

I decided to start using git much more frequently for my academic projects. I really like the control that my understanding of the back-end model provides, and that control in and of itself is a sufficient reason to consider switching to git. I'll also argue that learning the back-end model is a fun enough exercise to want to switch.

A very MIT signals problem

I've always found signals and systems interesting, as it is one of the most power tools out there. Signals and systems can be used to describe many different problems because it is simply an abstraction which describes a physical, mathematical, or computational system by the way it transforms an input signal into an output signal. It's often studied by electrical engineers because it has many direct applications to signal processing and communication systems, but there are lots of applications in other fields.

The signals and systems intro class at MIT, 6.003, is one of the most dreaded and disliked Course 6 classes. The class used to be required for all Course 6-ers, both EE and CS majors, but now that the EE/CS department has switched to a new curriculum, it's only required for double E's. It's a bit unclear where I fall in the Course 6 spectrum, but most of my friends think I'm crazy for having enjoyed 003.

I took 003 in Fall 2009 with Professor Denny Freeman. His approach deviated from the usual approach to the class by

  1. reordering the topics so that Laplace transforms were taught before Fourier transforms and
  2. introducing the concept of "Engineering Design Problems."

The "Engineering Design Problems," or EDPs for short, aimed to show 003 students some tangible applications of the material, and they were open-ended questions which typically required some amount of programming. For most people, these problems made them a bit more excited about signals and systems, but for me, this was all about getting excited about writing little pieces of code.

The most "MIT" EDP was assigned near the end of term:

The following images have been blurred. Figure out a way to sharpen each image to identify the following buildings:

Blurred Buildings assigned

You might be able to guess what some of those buildings are without even seeing the larger images the course staff included in the assignment. (b1 certainly is the most distinctive.)

Of course, there are many ways to blur an image, but looking at this from the simplest 6.003 perspective, it's most likely that either the rows or columns were blurred by a system with a single pole. After all, Denny Freeman is a big fan of Occam's Razor. Running with this assumption, such a system would have a system function with the following form:

H_{blur}(z)=\frac{1-p}{1-pz^{-1}},

where p is the system's only pole. This system would be stable if |p|<1. More importantly, if this system were a low-pass filter, i.e. if 0<p<1, it would blur the image.

You can deblur a system blurred by a single pole by applying a system with a single zero:

H_{deblur}(z)=\frac{1-pz^{-1}}{1-p},

where p is this system's only zero and has the same value as the pole in H_{blur}(z).

Since we will want to write code to deblur the image, we will want to get the difference equation corresponding to H_{deblur}(z) to apply to the rows or columns of the blurred images. The corresponding difference equation is:

y_{deblur}[n]=\frac{1}{1-p}(x[n]-px[n-1]).

Now that we may have figured out what's going on generally, let's look closely at image a1:

Blurred Building a1

The blurring in image a1 looks to be primarly horizontal, which means rows of the image's pixels would have been passed through the low-pass filter. To deblur this image, we should try to pass rows of pixels through the deblurring different equation, y_{deblur}[n]. The rows were processed either casually or anti-causally, i.e. left to right or right to left, respectively.

At this point, we really just have to dive into writing some code. The first deblurring code I wrote was a Python script to deblur a blurred_image from left to right with a pole p and save it to deblurred_image:

import Image
import os

def deblur_left2right(blurred_image, deblurred_image, p):
    original = Image.open(blurred_image)
    new_image = Image.new('L',[original.size[0],original.size[1]],0)
    original_pixels = original.load()
    new_pixels = new_image.load()

    for j in range(original.size[1]):
        new_pixels[0,j] = original_pixels[0,j]
        for i in range(original.size[0])[1:]:
            new_pixels[i,j] = (original_pixels[i,j]-p*original_pixels[i-1,j])/(1-p)

    new_image.save(deblurred_image)

After playing with different values for p, it was apparent that images a1 and a2 were blurred from left to right with a pole at 0.985, so deblurring them with a system with a zero at 0.985 returned the original images. Here is the unblurred version of a1:

Unblurred Building a1

As you can see, a1 is building 68.

The buildings in the second row, b1 and b2, had their columns blurred from bottom to top with a pole at 0.985, and the third row, c1 and c2, had their columns blurred from top to bottom with a pole at 0.985. You can change the for loops in the Python script to deblur in other directions. I encourage you to also see what happens when you change the value of p.

Even with "cute" EDPs like this one, 003 last fall was still all about grungy math - signals and systems often are. However, students who hadn't decided that the class would be too terrible before even stepping into 34-101 for the first lecture seemed to enjoy playing with some of the more tangible applications of signals and systems and got a lot out of the class. Hopefully, more people can come to appreciate this class in its own right, and maybe fewer people will shy away from being an EE because they fear 003.

My love-hate relationship with typeface rendering in Ubuntu

We take good, er at least reasonable, typography for granted all the time. This is especially true when it comes to personal computers because with Microsoft Windows and Mac OS X - upwards of 98 percent of the market - you get characters that are easy on the eye right out of the box.

Let's look at font rendering on Ubuntu. At first glance, it's disappointing. Every time I reinstall Ubuntu, I am bothered of the oddly tall lowercase "l" which is featured on one of the pages on ubuntu.com:

Ubuntu Applications Menu from ubuntu.com

I simply don't find this font rendering as pleasing to the eyes as font rendering on Windows or OS X.

Font Rendering GUI

Fortunately, even though fonts aren't as pretty right out of the box in Ubuntu, you can control your font rendering through an easy-to-locate GUI (System → Preferences → Appearance):

Appearance Preferences: Fonts

A lot of Ubuntu users don't seem to care that much about their font and can be satisfied by playing with the four options in this level of the GUI. The problem is that, unlike the average computer user, many people who switch over to Ubuntu are looking for something other than an operating system that works well when you surf the web or check your email. For those who want an OS that "just works," these four options may not be enough. As someone who (ignorantly?) grew up on Windows, I just couldn't find an option I liked from these four.

Beyond the fact that you don't have quite enough options, it's a lot easier to talk about fonts in terms of hinting, anti-aliasing (analogous to the Ubuntu option for grayscale smoothing), and subpixel smoothing/rendering. These three properties map directly to an aspect of how fonts are rendered on your screen.

Ubuntu's GUI-controlled font rendering options can be described in terms of these properties:

  • Monochrome: no smoothing, full hinting
  • Best Shapes: grayscale smoothing, medium hinting
  • Best Contrast: grayscale smoothing, full hinting
  • Subpixel Smoothing (LCDs): subpixel smoothing, slight hinting

I suggest having Firefox open with a word-heavy webpage you frequent, selecting your favorite of the four options, and then clicking the "Details..." button in the Appearance Preferences GUI, so that you can see what happens when you play around with these properties:

Font Rendering Details

Trial and error really is the best way to figure out what you like. (Note: Changing subpixel order isn't important unless your LCD screen has subpixels in a different order than RGB.) You should make sure to look at a lot of different sized fonts, and if you generally see a variety of fonts, look at a variety of fonts, too.

If you're anything like me, you might be curious as to what each of these three properties actually does and why. I won't include pictures of fonts that are rendered with different properties because different people use differently sized and styled fonts, and these differences are affected somewhat differently by the same rendering options.

Hinting

Hinting, also known as instructing, adjusts the display of a font's outline so that it lines up with a rasterized grid through mathematical instructions. The more hinting, the "crisper" your font appears at small sizes because hinting instructors seek to preserving detail without risking the outline's clarity when rendering fonts. According to the TrueType Reference Manual:

A quality outline font is one that provides legibility at small sizes on low resolution devices and fidelity to the original design at large sizes and on high resolution devices. Chance effects due to interactions between the outline shape and the placement of the grid should be minimized.

I generally do not like hinting, but I know a lot of people who strongly swear by full hinting for all font sizes.

Anti-aliasing

In digital signal processing, the choice of sampling period, T, directly affects the ability to fully reconstruct the original signal. The sampling frequency is defined as \omega_s=\frac{2\pi}{T}, and if there are frequency components \omega present in the original signal such that \omega >\frac{\omega_s}{2}, artifacts from the higher frequency components of the original signal distort the version reconstructed from the sampling process.

Anti-aliasing is a technique that minimizes the affects of the distortion caused by representing a signal at a lower resolution through sampling. This technique removes the high frequency signal components, \omega >\frac{\omega_s}{2}, which would cause aliasing before sampling.

Personally, I like using anti-aliasing for most text. This isn't in the GUI, so you'll need to update your .fonts.conf.

Subpixel Smoothing

Subpixel smoothing uses the fact that each pixel on a color LCD screen is actually composed of individual red, green, and blue subpixel stripes to smooth text with greater detail. If you're still using a CRT monitor, subpixel smoothing probably won't improve your font rendering. Whether or not you enable this property is mostly a matter of personal preference, as it also causes colored pixels to appear around text.

If you do choose to use subpixel smoothing, odds are the default subpixel order (RGB) is what your LCD screen uses; you probably shouldn't change this unless your screen uses a different arrangement.

I prefer not to use subpixel smoothing.

~/.fonts.conf

While you are able to toggle these variables through the Font Rendering Details GUI, you can get a lot more control over your typeface rendering if you use the .fonts.conf XML file in your home directory.

Here is an example of a .fonts.conf file which turns off hinting and subpixel smoothing but enables antialiasing:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
 <!-- Turn on antialiasing -->
 <match target="font" >
  <edit mode="assign" name="antialias" >
   <bool>true</bool>
  </edit>
 </match>
 <!-- Turn off hinting -->
 <match target="font" >
  <edit mode="assign" name="autohint" >
   <bool>true</bool>
  </edit>
 </match>
 <match target="font" >
  <edit mode="assign" name="hinting" >
   <bool>false</bool>
  </edit>
 </match>
 <match target="font" >
  <edit mode="assign" name="hintstyle" >
   <const>hintnone</const>
  </edit>
 </match>
 <!-- Turn off subpixel rendering -->
 <match target="font" >
  <edit mode="assign" name="rgba" >
   <const>none</const>
  </edit>
 </match>
</fontconfig>

The above XML handles the lion's share of the work my .fonts.conf file deals with.

While you can control the properties mentioned above through this file instead of the GUI, the real perk of the .fonts.conf file is that you can specify different properties for different font sizes and styles. For example, I prefer hinting without anti-aliasing for very small fonts, and I can control this through the .fonts.conf file.

You can certainly look at a complete manual for .fonts.conf files, but experimenting with parts of other people's .fonts.conf files is also useful because there isn't a right answer to questions of personal preference. Unfortunately, tweaking font rendering to your complete satisfaction often takes a non-trivial amount of time.

A meeting of mindsets

This spring, I finally had time in the right places in my schedule to incorporate teaching into my semester. I'm a Learning Assistant (LA) for 6.042: Mathematics for Computer Science, which offers an introduction to discrete mathematics oriented towards computer science and engineering. The class is taught in a very interactive manner: each hour and a half class session is split into a roughly half hour lecture and a roughly hour long team problem solving session. Additionally, students come from a variety of backgrounds: for many this is their first Course 6 class, whereas some students are Course 6 MEng's; some students have taken many theoretical math classes, while others have only taken one semester of calculus.

As an LA, I work on writing the problems for the class, prepare other materials for the class, coach the students during the class problem solving sessions, and hold office hours. Basically, the roles I've been personally assigned as an LA makes me a TA who doesn't have to deal with grading.

This semester's 6.042 students took their second miniquiz two Wednesdays ago, and their quiz grades were finalized this last Wednesday. The biweekly miniquizzes in 6.042 aim to test our students' understanding of the material covered in lectures, class problems, online tutor problems, and problem sets from the previous two weeks. Theoretically, students don't need to put in a lot of studying if they've been keeping up with those four areas of the course, but of course, an individual student may need more or less help. The miniquizzes typically average around 15 points out of 20, with a standard deviation of about 4 points.

Miniquiz two this semester turned out a bit differently. The histogram for total miniquiz score (I am using an image compiled by another TA, so to clarify, score values are on the x-axis and the number of occurrences is on the y-axis):

Overall Miniquiz Scores

Key statistics for miniquiz 2 grades:

  • 83 students "took" the quiz. It is important to note that 82 students actually took the quiz, and that the score of 0 is for a student who punted the quiz. (We'll be ignoring the student who punted in later statistics.)
  • The mean was 12.50 points.
  • The median was 12.50 points. (No skewness!)
  • The standard deviation was 3.93 points.

Based solely on the above histogram, the quiz scores seem to follow a fairly reasonable bell curve, just with a lower average than expected. From the perspective of the course staff, that would normally just mean we made the miniquiz a bit harder than we'd have liked, but the results of this quiz were more complicated than that because of the scores received on the last problem. The last problem was:

Let [\mathbb{N} \rightarrow \{1, 2, 3\}] be the set of infinite sequences containing only the numbers 1, 2, and 3. For example, some sequences of this kind are:

  • (1, 1, 1, 1...),
  • (2, 2, 2, 2...), and
  • (3, 2, 1, 3...).

Prove that [\mathbb{N} \rightarrow \{1, 2, 3\}] is uncountable.

Hint: One approach is to define a surjective function from [\mathbb{N} \rightarrow \{1, 2, 3\}] to the power set \mathcal{P} (\mathbb{N}).

Having seen proofs for the last six years, the mathematician part of me was not particularly concerned with the difficulty of this problem, and I quickly came up with the following solution (which did not use the hint given):

Proof: Assume that [\mathbb{N} \rightarrow \{1, 2, 3\}] is not uncountable. This means that there is a surjective function f:\mathbb{N}\rightarrow [\mathbb{N} \rightarrow \{1, 2, 3\}]. We will show that this is impossible by describing a sequence s\in [\mathbb{N} \rightarrow \{1, 2, 3\}] such that s\notin range(f).

Let f_0, f_1, f_2, \ldots be the sequences in the range of f. We define a sequence

s=(g(f_0[0]), g(f_1[1]), g(f_2[2]), \ldots),

where f_h[k] is the k^{th} element of sequence f_h and g:\{1, 2, 3\}\rightarrow\{1, 2, 3\} is a function such that g(i)\neq i for i=1, 2, 3.

By the definition of s, s[n]\neq f_n[n] for all n\in\mathbb{N}, proving that s is not in the range of f. However, s\in [\mathbb{N} \rightarrow \{1, 2, 3\}]. Thus, f is not surjective as claimed, and [\mathbb{N} \rightarrow \{1, 2, 3\}] is uncountable. \Box

A slightly different wording of my solution was included in the solutions for miniquiz two, below a solution which uses the hint given.

I was concerned that 6.042 students would find this significantly more difficult than I, but I dismissed this fear when the rest of the course staff assured me it would be okay. One could say it wasn't too terrible to have put this on the quiz, but let's look at just the results from that problem (Again, compiled by another TA; score values are on the x-axis and the number of occurrences is on the y-axis):

Problem 3 Scores

Some key statistics about the 82 student scores on problem 3:

  • The mean was 2.72 points.
  • The median was 3.00 points.
  • The standard deviation was 2.54 points.

These "key statistics" really didn't set off that many bells; they just indicate that the problem was hard. However, the way the scores were partitioned alarmed us a lot. Additionally, students who have consistently done well on problem sets, on the in class problems, and on the first miniquiz did not correlate with students who did well on this problem. How did this happen?

Students may have been ill-prepared for crafting a solution to this problem. It is true that they have only seen two proofs similar to this problem: the professor's lecture included a proof which uses an argument similar to the one suggested by the hint, and a problem set problem required a proof structured similarly to my solution. It is reasonable to assume that students are not as familiar with proofs presented during lecture as those they have worked out themselves, but this probably does not fully account for why students performed differently on this problem.

Many students who did get at least partial credit on this problem cited that they did not have enough time to think about this problem correctly as they just figured out what they would need to do too close to the end of the quiz. This is probably because the general mindset for 6.042 thus far differs significantly from the mathematical mindset required for proving problem 3's statement, which isn't too surprising given that the mind of a mathematician and the mind of a computer scientist are required to work in (at least partially) different ways. Simply put, problem 3 of the second miniquiz showed me how unnatural theoretical proofs are to computer scientists.

I wonder if this illustrates that a deep understanding of math is becoming less and less important for being a good computer scientist as many successful MIT graduates take no math courses beyond the general Institute requirements and 6.042. However, I'd like to believe that even if this is the case, difficult problems incite curiosity within my students, causing them to dive deeper into mathematics. After all, having a better understanding of mathematics can't hurt them as computer scientists. At least, the questions students have personally emailed to me since getting their quizzes back supports my claim.