How do I get a job in data?
Data is an attractive position right now, especially for “outsiders” interested in entering tech but don’t want to do engineering. It’s a technical skill that can serve as a career foot-in-the-door on the revenue-oriented side of the company (the “business people”). This is important because unless you have the “right” brand names on your resume, it’s really hard to get a job in tech. The justice or injustice of it is beyond the scope of this post. The important thing is getting the job.
Data is actually an ideal foot in the door because proficiency is based roughly equally on three things:
- Knowledge of SQL
- Years of experience working with data
- Years of experience working with that company’s schema and reporting
This is great for novices because you do not need to be an insider to learn SQL. Therefore, an infinitely experienced analyst can only be twice as productive as you. That’s insane. I’ve been coding for 2-3 years now, but when I meet engineers who have been writing code for 6+ years, they are at least 4x more productive than me1. The fact that you can become 50% as productive as an analyst with 4 years experience is ideal. Plus, the last third of proficiency is hyper-local to the actual company, so an experienced analyst actually doesn’t have much of an advantage over you. If you’re willing to outwork them and exceed them in the last point, you can be their equal within a year.
Data has a logarithmic learning curve2, so you can actually make a fair amount of progress within the first 100 hours, provided you have clear objectives, tools, and material. Focus is really important here. Start with Postgres. It’s easy to install on your laptop, which is key because one of the most annoying parts of learning SQL is loading the damn data. Also, Redshift’s syntax is mostly based on Postgres and has the biggest piece of market share.
It’s especially important easily create and load sample data. You don’t just want to learn the keywords and functions: you want to apply it against a dataset. I’d recommend generating your own in Sheets or Excel because Googling “sample dataset” will get you datasets that are gigabytes big and tables with dozens of columns and weird scripts…ugh, yeah. Just write your own.
This one is more subtle, because data people aren’t just expected to “crunch the numbers.” You’re responsible for them. So when you’re asked to pull a number, you need to understand the subtlies behind those numbers. When someone asks for signup to paid conversion, you need to consider what conversion might mean for this business. How do promo code factor in? What are the time horizons for conversion? Is it a marketplace? Think about buyer and seller metrics.
This will come up in an interview. If you’re interviewing as a SaaS company, know the basic SaaS metrics, like MRR, LTV, CAC, etc. Google “SaaS metrics cheatsheet” and think about how you’d calculate each. Even if the recruiter tells you not to worry about it, this is an easy way to blow people away and tip the scales in your favor.
I consider this post to be a work in progress. As I see more common questions, I’ll update this post as needed. If you’re reading this and have questions, feel free to @ me on twitter.
I’ve actually been curious about this. In writing code, I’m probably only half the speed of a very experienced engineer. But I spend more time in code review and fixing bugs than they do. I’d love to dump GitHub data into a database someday and crunch this out. ↩
unlike software development, which from what I can tell, has an S-shape curve, inflecting at around 4-6 years, depending on the person and the type of engineer. At around 10 years in, you might still be learning, but your individual effectiveness seems to plateau. Hence most great engineers taking on a more leadership (either technical or managerial) position at that point, instead of writing code. ↩