Hook your next fish — how to write the perfect data science white paper

Peter Sellars as Dr Strangelove, emphasing the importance of publicising a successful project.

‘The whole point of the [project] is lost if you keep it a secret. Why didn’t you tell the world, eh?’ — Dr. Strangelove, 1964.

One of the best things about doing exciting work is telling other people about it. Apart from a warm glow of self-satisfaction, when you impress others with your past work there is a decent chance that more exciting work may come your way.

A key way to do that is to write a whitepaper. A whitepaper is a marketing document, which aims to showcase the author’s expertise in a particular area. Typically the author will try to either explain how they solved a problem with their expertise, or teach some basic aspects in their field, with the aim of helping the reader understand when it’s time to call the experts. Hence, a tradesperson might share some tips around some very small jobs, leading up to the point that the reader should call in the professionals.

There are a great many guides to writing white papers throughout the internet, for example here, often including a guide to structure. In the case of data science, though, there is a twist, which is that usually the author is using their data science expertise to solve a problem in an area where the reader is an expert. This has a small but noticeable effect on the way that the document needs to be structured, and how to approach the audience.

A first task will be to establish credentials in the area reader’s problem domain, and as you are unlikely to have higher qualifications or experience in that area than the reader, straightforwardly offering up your own credentials is unlikely to succeed. Instead, the best path forward is likely the ‘show don’t tell’ approach, often seen in creative writing classes. In that context, it refers to allowing the reader to see your characters in action and their story unfold, rather than writing out their traits or outlining the plot. In this context it means explaining the domain problem you worked on in a way that leaves no doubt of its importance to the field. You wouldn’t be working on it if a solution wasn’t valuable, so explain where the value lies — many times it will take the form of a this problem being a road block for a bigger target. Overall, by demonstrating you understand how the problem effects their business allows you to win the audience over.

Once you’ve established the problem, the next step in the story will be how you solved it. In the context of data science two tools will commonly be needed to obtain a solution — an adequate data set (‘adequate’ because most data sets fall far short of our ‘ideal data set’) and suitable analysis tools.

Given that so many data science tools are open source, there is a reasonable chance that the data set — if not in its original state, often after the cleaning and pre-processing you’ve performed on it — represents an advantage over competitors. Hence, mentioning either the way the data was obtained or cleaned may be useful to further establish credibility. This is especially the case if you used advice from subject matter experts to improve the pre-processing process, for example if there was a reason for missing data relating to the collection process that determined how those missing data were treated.

When discussing the algorithm used, it’s not just a question of correctly tailoring the discussion to a non-technical audience but also a question of pacing. To maintain your readers’ attention, the whitepaper needs to have the feel and pacing of an unfolding story; too much detail on how the algorithm works and how you did it will slow the pacing and put off the reader. Crucially, it is not necessary for the reader to come away with a complete understanding of the algorithm used to get your message across. It is almost more true to say that any description of the algorithm provides more colour and interest than it provides an true explanation of how the algorithm works.

Applying your algorithm to data represents the second act in your three act story. Here, the solution itself maThsy not be the selling point, as important as it is. When you’re implementing something similar to a predictive model, the selling point will often be what you observed about your data along the way — an extra lesson about the way the variables interact with each other or a surprise about which variables are the most influential or the shape of the relationship. If necessary, extend your analysis to be a complete inferential analysis (IMHO you do that anyway, but that’s a topic for another day).

Your whitepaper will make people remember you and think of you in their field if you frame it correctly. One of the biggest barriers to acceptance of data science solutions is going to be a feeling that the data science is usurping expert knowledge — the whitepaper represents a golden opportunity to show that data science is not a usurper, but complentary to expert knowledge.

This piece follows from this piece and this piece, which relate to earlier stages of the data science sales cycle. A rundown of how this piece fits with intended future articles can be found here at my blog.

Hook your next fish — how to write the perfect data science white paper was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Please follow and like us: