What is the most effective way to structure a data science team?

From 2012 to 2017, I had the privilege to build the Data and Analytics organization at Coursera from scratch. Over that period of time, we experimented with a variety of different team structures as the company grew in size and the business evolved (see also this blog post or this Quora answer). Based on my experience, there’s no one “right way” to structure a data science team — what works best will be highly dependent on the context and growth stage of the company.

Although there aren’t any hard-and-fast rules, the core questions to keep in mind are generally:

  1. How should data scientist roles be defined?
  2. Where should data scientists report?
  3. Where should the data science function live?
  4. What should an organization do to set up data science for success?

Let’s go through these one-by-one.

How should data scientist roles be defined?

Borrowing from Michael Hochster’s taxonomy, there are generally two main types of data scientists:

  • Type A (“analysis”) data scientists focus on making sense of data through statistical analysis.
  • Type B (“building”) data scientists develop predictive models and algorithms to power data products.

Some companies create title distinctions between these two different flavors of data scientists (e.g., Decision Scientist/Statistician/Quantitative Analyst vs. Data Scientist/Data Product Scientist/ML Engineer). Other companies try to distinguish Type A data scientists from product/business analysts roles that are often considered less technical. Although specialized titles arguably help with expectation setting, personally, it’s not clear to me that the effort needed to clarify these distinctions is worth the effort. In particular, for early-stage startups that need generalists, erecting artificial boundaries between highly overlapping roles often leads to murky discussions around territory and scope that can often be unproductive.

Recommendation: For a startup, don’t waste time on distinguishing titles, but recognize that the individual data scientists you hire may come in all shapes and sizes (some may be better at analysis, others at building). Eventually, there may be valid reasons to clarify roles and responsibilities, but this can often wait until a company is larger.

Where should data scientists report?

In general, data science teams tend to adopt either a decentralized or centralized reporting structure.

  • Decentralized (or “integrated”) data science organizations have data scientists reporting to different functions or business units throughout a company. This organization arises commonly in larger organizations where data science initiatives have arisen organically in multiple parts of the business. Decentralized organizations are often attractive from an accountability perspective since they provide business units with greater flexibility to control their own resourcing needs. Decentralization promotes strong organizational alignment since data scientists are first-class members of product or business teams. This ensures that data scientists will have the context needed to work effectively with their business partners and the opportunity to develop meaningful personal relationships to get buy-in for ideas and initiatives. However, decentralization also creates a number of challenges. In order for a decentralized structure to work well, teams with data scientists need leaders who are competent to manage both engineers and data scientists. Data scientist mobility is more restricted in a decentralized organization, often leading to knowledge silos, fewer opportunities for peer mentorship, or limited career growth opportunities. Decentralization can also make it more difficult to enforce uniform quality hiring standards, invest in shared analytical infrastructure, or drive adoption of standardized analytical practices.
  • Centralized data science organizations have data scientists reporting into a single head of data science within a company. For startups, centralized teams tend to be more efficient headcount-wise due to flexibility in resourcing allocation. In these teams, data scientists have more opportunities to participate and collaborate with their peers on a broad range of projects, therefore providing better career growth and technical mentorship opportunities. Structurally, centralization also simplifies hiring and recruiting, creates agency to drive company-wide analytical initiatives, and reduces knowledge silos. On the other hand, data scientists operating in a centralized team that is too far removed from the priorities of their business partners may lack the context or buy-in needed to be effective. In some cases, this can lead to an unhealthy dynamic where the data science is treated as a support function, answering questions from product managers rather than operating as true thought partners and proactively driving conversations from a data-informed perspective.

Most smaller companies tend to rely on a hybrid centralized/decentralized strategy that combines elements of the two strategies above. Generally, data scientists report centrally since recruiting and retaining talent is generally the primary bottleneck in building a data science team at the early stage. However, to ensure that data scientists are empowered to succeed, startups will often position data scientists to work closely with business units, a practice known as embedding.

In small data science teams, embedding can have the side effect of leaving data scientists overly isolated. At Coursera, we addressed this issue by forming small collaborative sub-teams (or “clusters”) of 2–4 data scientists that would partner with different functions/business units, as described in this blog post. A similar approach was used to scale data science at Airbnb.

Recommendation: For a startup trying to establish a strong data culture, centralized reporting is likely the easiest way to start. Use embedding to ensure that data scientists are working on projects that are valuable to the business, but beware of creating knowledge silos.

Where should the data science function live?

In some organizations, the data science function is a part of the engineering organization (e.g., LinkedIn). In other organizations, data science may be part of the product organization (e.g., Coursera) or may exist as an independent function directly reporting to the CEO (e.g., StitchFix). Sometimes, Type A and Type B data scientists may report to different groups (such as at Instacart).

There’s no single best place to put data science, and it’s fairly common for data science reporting structures to evolve over time even within the same company. The best placement depends on factors such as:

  • What avenues of communication are there for data scientists to influence product decisions?
  • To what extent are data scientists involved in the production engineering needed to develop machine learning systems?
  • What executive in the company will be able to advocate most effectively for data science?
  • To what extent is data science a core competency for the organization as a whole?

Recommendation: Regardless of where data science lives, prioritize fostering strong cross-functional collaborations since data scientists will almost always need to work closely with product managers, designers, marketers, engineers, and other product and business leaders throughout the company.

What do data scientists need from the rest of the company?

Much of the success of a data science team has a lot to do with how the team itself is structured and run. But much of it also has to do with how other teams in the company interact with data scientists. In general, I’ve found three things to be important:

  • Data infrastructure engineering support. In order for data scientists to be effective, key tooling (e.g., data pipelines, instrumentation and tracking, A/B testing frameworks, and analysis environments) are critical. Therefore, it often makes sense for companies to prioritize hiring data infrastructure engineers before hiring their first data scientist, and it’s vital that data infrastructure engineering and data science teams work in close partnership.
  • Product and engineering managers who understand the complexities of building data products. The considerations involved in building data products are very different from regular products — working code is not sufficient into sure that a data product delivers the right experience for the customer. Data quality problems are insidious and debugging complex predictive models can be a real challenge. Deadlines don’t work in the same way either; unlike most software products where cutting scope can be an acceptable strategy for delivering a project on time, cutting corners often doesn’t work when it comes to data products. Having a good intuition for the capabilities and limitations of machine learning is critical when prioritizing features. As a result, having product and engineering managers with relevant technical knowledge can help things go far more smoothly. In many cases, data scientists themselves are often the most qualified to take on these types of roles.
  • Strong executive buy-in. Much of the influence of a data science team depends on the willingness of the executive team (and the CEO, in particular) to hold people accountable for fostering a strong data-informed culture. This is especially true for Type A data scientists, whose work often depends on strongly on the willingness of product managers and business leaders to be receptive to listening to data over the HiPPO (highest paid person’s opinion). In addition, building and maintaining the infrastructure needed to unlock data as a competitive advantage takes significant time and resources, so executives who want a strong data culture should be prepared to make the appropriate investment.

Recommendation: When building a data science team, make sure that the team is set up to succeed by considering the environment in which the team operates.

Finally, I’ll close by saying that the above recommendations are far from absolute but mainly just reflect my own experience at Coursera, through a process of trial-and-error over the past five years. Some of the most helpful online resources that shaped my thinking were:

  • Building an Analytics-Driven Organization: a white paper from Accenture that provides a nuanced description of different styles of analytics organizations.
  • Building Data Science Teams: an e-book by DJ Patil (who coined the term “data scientist”) that outlines the key elements of building teams that can deliver on great data products.
  • Analyzing the Analyzers: an e-book by Harlan Harris, Sean Murphy, and Mark Vaisman that takes a data-driven approach to understanding the various archetypes of data scientists found in the wild.

Over this time, I also had the good fortune to draw on the generous advice of amazingly helpful data science leaders at other companies, so if I can be of help, don’t hesitate to reach out.

Note: This blog post originally appeared as a Quora answer.

What is the most effective way to structure a data science team? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Please follow and like us: