SQL for product managers – the definitive guide

SQL for Product Managers

Product management is about taking your product in the right direction with the fastest speed possible. Data is always going to be an integral part of your decision-making, and having the ability to do the primary analysis yourself can definitely boost your speed as a product manager.

In this guide, we will understand what is SQL and how product managers can use it to make better decisions, faster.

Here is the index if you want to jump to a particular section:
Introduction
Do product managers need to know SQL?
SQL & Relational databases
Retrieving data with SQL
Filter data with SQL
Using LIMITS & SORTING in SQL
Using Aggregate & Group BY
Introduction to Joins
Where to write SQL queries?
Best practices for using SQL for product managers
Common questions you can try answering with SQL

Depending on the type and scale of the organization that you are working for, as a product manager you’ll have access to a lot of free and paid tools available for data analysis, ranging from Google Analytics, Mixpanel to Tableau, or PowerBI. 

While these tools will add up to your productivity and analysis skills, it’s better as a product manager to be as close to your data as possible. In most of the applications that you’re going to work with, the data would be stored in something known as relational databases.

Once you develop a decent understanding of how these work and how you can fetch the data and insights from these databases you’ll get a better command of your product’s usage data. 

This understanding will help you to make decisions faster as a product manager because now you do not have to depend on your data team to carry out basic analysis for you and also you can guide your development team to record any additional data that would help you in improving the product. 

Do product managers need to know SQL?

In summary, here are 3 reasons why product managers should learn SQL:

  • In-depth understanding of your data, to meaningfully contribute to data decisions.
  • Ability to uncover insights that are not available with standard tools such as GA, mixpanel etc.
  • Make decisions faster by reducing dependency on your data teams for basic analysis and using them for better challenges.

What is SQL?

SQL which stands for ‘Structured Query Language’, is a computer language that is designed to retrieve and manage data in a relational database. 

Thus to understand SQL you need to learn three things:

  • What is a relational database
  • How data is stored in these databases
  • How do you get the data that you want from these databases

 A relational database is a collection of tables where data is stored in rows. Think of it like a google spreadsheet, the whole spreadsheet will be called as a relational database and the individual sheets within the spreadsheet will be called as tables. To enter any new data we insert a row into any of the sheets and the information about that data is revealed by column names. 

Here is a simple visualisation of sheets into relational database:

SQL for Product Managers - HelloPM.co

These tables could be related to each other through some common columns (also known as foreign keys). Lets consider a very basic example from instagram to understand this, a simplified database structure (AKA data-schema) for instagram could look like this:

Table 1: Users
Columns: id, first_name, last_name, handle, registered_at, email_id

Table 2: Photos
Columns: id, user_id, photo_url, created_at, caption

Table 3: Likes
Columns: id, user_id, photo_id, created_at

As per this schema, whenever someone registers on Instagram, a new row is added, the person is assigned an automatically generated unique id (id) and other information (first_name, last_name, handle, registered_at, email_id) about them is recorded in the users table.

Whenever someone uploads a photo on instagram a row in photos table is added, which will contain the photo url, time of posting and caption along with the user_id to keep a track of which user has uploaded this photo,

When someone likes this photo, one new row is entered in the likes table with the user_id of the person who has liked the photo, the id of the photo which is liked and the time when the photo was liked.

Now if you have been following closely till now, you would have identified that the photos table is connected to the users table through user_id, the likes table is connected to the users and photos table through user_id and photo_id respectively. 

The user_id in the photos table, the photo_id and user_id in the likes table are examples of foreign keys. 

The ability to create these kinds of relationships between data tables is the real superpower of relational databases, and you can model the data for almost any product into a relational database using multiple tables with such relations between them.

The example we have taken here is extremely simplistic and in real-life instagram might be using hundreds if not thousands of tables to model and store their data.

While this may not be under your responsibility as a product manager to decide or create the data-schema, it is an added advantage to have a clarity of what all data do you store, so that you are aware of what all insights you can drive from your data.

Now that we know how data is stored in these tables, lets understand how we can retrieve data from these tables to drive insights.

This is where SQL comes into play. Simplistically speaking, SQL is the high-level language to give commands or ask questions to your database. 

Retrieving data with SQL

While you can use SQL to read, insert, update or delete data into relational databases, as a product manager your scope of work will always be limited to reading data.

Now let us understand how we can make some basic read queries to our database through SQL.

We will take examples from our instagram schema only.

If you want to see some data from any tables you will use the popular ‘SELECT’ query.

The syntax of query is as follows:

SELECT {column name 1, column name 2} from table name

If you want to retrieve all the columns you can replace {column names} with “*”.

Using * could be particularly useful for product managers when they want to see the list of information that is stored in an existing table.

Thus, to retrieve data from the users table in our instagram database, we will use:

SELECT * from users;


Filtering data

Now if you want to filter your data by some criteria, you can use the ‘WHERE’ clause, for example you want to name and date of registration of the user whose email address is ‘john.doe@gmail.com’, for such as case you will use:

SELECT first_name, last_name, created_at, email_id
FROM users 
WHERE email_id = ‘john.doe@gmail.com

Apart from = condition, you can also use conditional statements such as greater than (>), greater than equal to (>=), less than (<), less than equal to (<=) or like to further define your filters.

You can also combine multiple WHERE conditions to further define your data requirements. The combination can be done by using ‘AND’ when both conditions are essential or by using ‘OR’ when only one of the given conditions are required to be correct.

Here are some example queries to help you understand this better:

Get List of people who have registered in march 2021

SELECT * from users where created_at >= ‘2021-03-01’ AND created at <= ‘2021-03-31’

Get list of people whose first names are either Ankit or Ankur

SELECT * from users where first_name = ‘Ankit’ OR first_name = ‘Ankur’

Get list of people who use gmail.com as their email provider

SELECT * from users where email_id like ‘%gmail.com%’

If you have a range of items to check from, you can use ‘WHERE IN’ clause, for example: to get list of people whose email addresses are ‘a@gmail.com’, ‘b@gmail.com’, and ‘c@gmail.com’ you can use following query:

SELECT * from users WHERE email_id IN (‘a@gmail.com’, ‘b@gmail.com’, ‘c@gmail.com’).

That was about 80% of the knowledge you need to have on filtering data, if you are interested in learning more please go to tutorialpoint to expand your SQL knowledge.

Apart from filters there are two more important things which you as a product manager should know: How to sort the data and how to limit the amount of data you want to retrieve.

LIMITS & Sorting

  1. You can sort the data by using ORDER BY clause
  2. You can limit the number of rows you want to see by using the LIMIT clause.

Here are some examples:

This query will give you list of users, with recently registered users on the top (descending order of date of registration):

SELECT first_name, email_id, created_at
FROM users
ORDER BY created_at DESC

This query will give you 10 recently registered users:

SELECT first_name, email_id, created_at
FROM users
ORDER BY created_at DESC
LIMIT 10

As a product manager you should almost always use LIMIT in your queries, because omitting LIMIT will cause SQL to select all the available rows, which can affect the performance of your database or can even bring your database down if there are large numbers of rows in the table.

AGGREGATE & GROUP BY functions in SQL

SQL also offers something known as AGGREGATE functions, which can help you perform some calculations on data that you retrieve. The common AGGREGATE functions are: AVG, SUM, MIN, MAX, COUNT, SUM etc. Their functions are self-explanatory by their names, eg: COUNT will give you the count of all the rows which satisfy a particular condition and SUM will give you the SUM of all values of a particular column in a result-set.

Here is an example to illustrate the usage of the aggregation function COUNT:

To get all the likes of a particular photo on instagram, you will use this query:

SELECT count(*), photo_id FROM Likes where photo_id = {photo_id}

To get the number of photos liked by any particular user (lets say user_id = 20) you will use a query like this:

SELECT count(*), user_id FROM Likes where user_id =20

You can read more about different types of aggregate functions here 

Another important command to learn in SQL is GROUP BY clause. With group by clause you can arrange identical data into groups. When used together with AGGREGATE functions, GROUP BY can uncover some really helpful insights for you.

Lets understand with few examples:

To know the number of photos liked by each user on instagram we’ll use:

SELECT count(*), user_id from Likes GROUP BY user_id 

To know the number of photos posted by each user on instagram we’ll use:

SELECT count(*), user_id from Photos GROUP by user_id

JOINs in SQL – Using multiple tables together

Till now we have understood how you can filter, sort, limit, aggregate and group data. Now the next and the most interesting part of SQL: JOINS. Joins will help you connect two or more tables and unlock insights which are spanned across multiple tables.

To join two or more tables, you’ll need some common values (columns) between them, for example to know names of all the people who have liked photos we will have to join two tables Users & Likes, the common columns will be id column in users table and user_id column in Likes tables.

Here is what the query is going to looks like

SELECT first_name, last_name, photo_id
FROM users JOIN Users.id = Likes.user_id

While this definition is enough for some of the common use cases, you should definitely spend some time learning about different types of joins here.

You now are aware of the most common SQL queries and terms that a product manager should know about SQL. 

You now have practical knowledge of:

  • How to understand a data-schema
  • How to retrieve data from a relational database
  • How to filter data as per your requirements with WHERE, IN and conditional statements
  • How to Limit & Order your data for most efficiency
  • How to aggregate data with aggregate functions and using GROUP BY to make uncover important insights
  • How to use simple joins to get data which is available across multiple tables

Where do I write all of these queries?

One pressing question that will come to the minds of some readers is ‘where do i write all of these queries to get the data I want?’. For this you need to talk to your developers, if you are working for a decent tech company then you might already have a read-only database exposed for analytical usage, and that tool will most probably give you a window to write custom SQL queries like the ones we have studied above.

Some common free tools you can use to expose read-only databases are:

  • Redash
  • Metabase
  • PHPMyAdmin
  • MYSQL WorkBench
  • Heide SQL, etc.

Your developer should help you in setting up an environment or you can read documentation from these tools and work with developers to get it done for you.

Here is how metabase looks like:

Some analytics tools such as Google’s firebase (through bigquery) and Mode analytics give you the ability to run native SQL queries on their data-sets.

Best practices for product managers to work with SQL and native data

  • Invest time in understanding your data schema: if you don’t understand your data, your knowledge of SQL will do no good to you or the company. You can take help from an engineering manager or CTO to create documentation around this as this would be useful for other product managers and new developers as well.
  • Use LIMIT in your queries, your queries will be faster and you’ll save resources.
  • Do some sanity check on the output of your data, especially when you are using joins. Take help from the data team when in doubt.
  • Ensure that your data analysis is correct before presenting it to stakeholders, presenting wrong data is one sure shot way to losing trust. Take help from data team and do sanity checks [again].
  • Learn about common functions around how to work with dates in SQL. Dates are going to be an important part of your data analysis process and they behave differently than other data types. You can learn about common date functions here. Also, keep in mind the time zone in which your database is recording data.

Here are some questions which you can practice answering through SQL for your product:

  • Number of user registrations every month.
  • How many users have been acquired from which marketing campaign.
  • What is the distribution of transactions across your users.
  • How many users have filled their contact information.
  • How much money/time does your power user spends on the app as compared to new users.
  • On which day of the week do you get the most number of users.

There could be lots of such questions for your own product through which you can practise your newly acquired SQL skills and build a solid understanding of your data and customer experience.

That’s all folks!

I hope this guide will help you make better and quicker decisions as a product manager. Please share this guide with aspiring or existing product managers in your circle, I am sure they are going to thank you for this!

If you are looking to break into product management or want to upgrade your product management skills, you can go ahead and apply to be a part of the next Cohort of our Immersive 11-week live program on product management and growth from here.