What to Interview For

I’ve been helping some startups with their interviewing and recruiting lately. I’ve seen startups which have capable interviewers who lack bandwidth, which feel confident interviewing for some roles but not others, which have an illogical or inconsistent process, and some which have no real process at all. In this post, I will discuss the different types of interviews for technical candidates, and suggest who to interview for what.

Caveat: I am describing what I consider typical interviews. With effort, some startups have successfully re-invented technical interviews like, for example, having the candidate work for the company for a week. (I remember reading about this company a while ago, but forget the name. Can anyone help with a source?)

  • Prescreen for Coding Becoming more and more common. Some sort of coding problem to be completed prior to speaking with anyone technical.
  • Coding and Algorithms The most basic kind of technical interview asks you to code a solution so some problem. Solutions usually require algorithms or data structures commonly taught in 2nd or 3rd year of undergrad CS programs.
  • Architecture Depending on the company’s space, how do you build systems? As an example, in digital advertising I might focus on large scale systems, failure scenarios, and distributed counting problems. A games company might focus on graphics engines and AI.
  • Team Leadership Covers sprint planning, software engineering best practices, testing practices, deployment strategies, and generally how to run a team well.
  • People Management How do you manage people? I also often include hiring processes in here.
  • Team Management This covers managing teams (e.g. having managers reporting into them). It would also include things like budgeting, company or division policies, and longer term prioritization.
  • Communication How are they at communicating? Can they discuss technical topics with non-technical folks successfully?
  • Product Management How do you define software deliverables which meat business needs?

In general, you ask more senior people items further down the list. You also ask less of the basics. Example: a Sr. Manager or Director likely knows how to code or at least did at some point. They should ideally be able to pass your coding question, but if they will not be required to code in this role, then it might be okay to pass them with a mediocre result if they’re strong in other areas. For a mid-level software engineer, their coding is actually the most important component, so you should really focus on this and make sure they are strong in this area.

Job Title Prescreen Coding Architecture Leadership People Mgmt Team Mgmt Communication Product Management
Jr. Software Engineer / College New Hire 1 3
Software Engineer 1 3 1
Sr. Software Engineer 3 1 1
Team Lead 2 1 1 1
Dev Manager 1 1 1 1 1 1
Director 1 1 1 1 1 1 1

As you can see, I would spend much more time interviewing more senior people. I also think it’s usually okay if someone “fails” one of their interviews. Unless they do spectacularly poorly, the other interviews should even out. Another line of reasoning here is that you want them to grow into the role. If they do amazing on all your interviews, then maybe they should be a level higher. This makes sense since you generally want to be more sure of them.

What Does a CTO Do?

For many people who look at the CTO role at a startup from afar, it may not seem like there’s that much to it beyond the purely technical. However, to be successful, there the role requires so much more than purely technical talent.

Sure, the CTO should be a strong coder, he should set a technical vision for the organization, he must manage engineers who report to him, and he must communicate clearly to both his employees and “the business”. This is just the tip of the iceberg. Most of the other skills and areas of expertise I’ll list below are learnable and can be developed with experience. Most of them are also taken for granted by non-technical folks when present and are only noticed as needed skillsets when absent.

First off, what are the obvious skillsets needed? Those which should be hard requirements and are difficult or otherwise uneconomical to teach.

  • Coding. Pure and simple, the CTO needs a strong technical background. They should also be a productivity workhorse, able to deliver the throughput of 3 engineers.
  • Architecture. Beyond building early prototypes, the CTO must be able to lead the team to build things which fit together and scale with the company.
  • Communication. The CTO must be able to work with the business when capturing requirements and communicate them clearly to his team.
  • Management. The CTO will have a team reporting to him, and so much understand people management. As it grows, he must also engage in annual reviews or other feedback processes.

So, what areas of expertise does a CTO need beyond the obvious?

  • Recruiting and Interviewing. Most startups need to grow the technical team. Few startups have dedicated HR professionals to help, so this often falls to the CTO. Even with support, it will be the CTO’s responsibility to interview candidates and often they must also make and close offers.
  • On-boarding, Knowledge Capture/Transfer. Once hired, the CTO must train other engineers. Even experienced hires will need some training on practices, policies, and how to work with the code. Few startups have any real plans to ramp up engineers or to capture knowledge in any medium other than engineers' heads.
  • Sprint Planning and Prioritization. The CTO needs to be able to plan work for others. At the core, he’s responsible for everything technical which is delivered and must ensure his team is working on the right things.
  • Product Management. Often the CTO wears this hat. Sometimes there is a cofounder or early hire which handles this, but very often it is the CTO. Even if the CEO is the “product manager”, it may actually be the CTO making most of the smaller decisions. Making the right decisions in a timely manner is an important skill.
  • Quality Control and Best Practices. The CTO must know how to build the checks along the way to minimize bugs and create a high quality code base. When you hear of companies needing to do a full rewrite on a product (which is not a prototype), it’s likely because their quality control failed.
  • Infrastructure Planning and Scaling. As a company grows, the infrastructure bill is going to grow. Eventually someone notices and the CTO must take responsibility for managing the costs. This requires evaluating how much things are costing, planning future infrastructure needs, negotiating with vendors, and prioritizing cost-saving work. Closely related is how the system scales and recovers from infrastructure failure.
  • Monitoring, Alarming, and Metrics. A production system is like a living thing. There is constant barrage of requests either from clients. Keeping track of what’s going on, identifying when a problem arises, and being notified of said problem in a timely and appropriate manner requires vision, planning, and policies.
  • Operations and Bug Tracking. With a system in production comes a constant stream of bugs (major and minor), small tweaks, routine deployments, routine fixes, latency spikes, rounding errors, etc., etc. Collectively called operations, it requires a dedicated skillset to manage in a sustainable fashion which does not impact product deliveries.

Tickwork Powered by AWS

My previous post introduced Tickwork which performs scheduling in Ruby, but doesn’t contain a clock. The intentionally missing component of the scheduling library means that something must regularly call the manager to drive the scheduling of events.

AWS recently introduced Cloudwatch Events – essentially a cron in the cloud that does nothing on it’s own. We can use this to power Tickwork.

The way we can do this is by having Cloudwatch Events push to SNS, which we’ll then call out to our Rails app with an HTTP post. How much does this cost? $0. Cloudwatch Events are free (as far as I can tell), and we are comfortably in the free tier for SNS. (100,000 HTTP calls per month are free, then just $0.60/1 million after that. source)

To make this easy, I’ve created AWS Tickwork, an engine for Rails. The main part of AWS Tickwork is an SNS controller which makes receiving posts very easy. This controller simply calls Tickwork. The engine also contains an optional migration to use ActiveRecord as the datastore required by Tickwork.

A full app example using AWS Tickwork can be found in the spec directory. I’m also using this engine successfully in another project … maybe a future post.

Introducing Tickwork

Tickwork is a Ruby library which supports scheduling of jobs. It is essentially a fork of clockwork, but has been significantly simplified. The simplification comes mainly from the removal of the self-driving time engine. That is to say, tickwork requires external calls to move through time.

Why would you possibly want a scheduling library which can’t execute it’s own schedule? The typical way to run an application relying on clockwork is to have a process dedicated to running the schedule. This process basically just spins in a loop checking for work that needs to be done. There are a couple of drawbacks to this approach:

  1. It requires a separate process. On platforms like Heroku, that involves running an extra dyno (i.e. $$).
  2. It is vulnerable to missing jobs during restarts, deploys, or other failures. We had this problem at my old company, Thinknear. We scheduled critical jobs to run in the top half of the hour and would deploy in the bottom half of the hour, just to try and avoid missing a job. If a job is missed, there’s no built in way to catch up.

Tickwork, on the other hand, does not require a separate process (though you may still choose to use one). Tickwork only moves through time (scheduling jobs) when told to do so. As a result, it is not vulnerable to missing work due to restarts, long running jobs, or other externalities. It uses an “at least once” approach vs. clockwork’s “at most once”. If a tickwork run is interrupted, say, due to a deployment, it never records finishing and so will re-run that period of time on its next invocation.

This robustness does come at a cost. Unlike clockwork, tickwork requires a datastore. It requires only a small omount of data, just one master timestamp and one timestamp per recurring job. Tickwork is compatible with ActiveSupport::Cache::Store, the only caveat being that it must be a shared cache across all application hosts. Creating a database table would also be trivial (coming in future blog post).

As a last note, clockwork supported dynamic scheduling of jobs via the database. This feature no longer exists in tickwork.