How data scientists answer crucial business questions
2,400 years ago…
A particularly inquisitive fellow named Socrates imagined a world where prisoners were born and held captive in caves for their entire lives. These prisoners were chained up and forced to stare at a blank wall with the only source of light being a small fire directly behind them. Without the ability to turn their heads, the entire reality of these prisoners consisted of what they could see on the wall in front of them. If a figure were to walk by the fire behind the prisoners and cast a shadow on the wall, the only thing that the prisoners would be able to see would be the shadow on the wall. Since the prisoners would be completely unaware of anything outside their field of vision, they would be forced to conclude that the silhouette on the wall and the figure itself were one and the same.
Socrates further wondered what would happen if one of those prisoners were to leave the cave. Upon exiting the cave, the free man would quickly be overwhelmed by the flood of new sensory information. At first, this new, brightly lit world would be daunting and disorienting, but after acclimating, the former prisoner would realize that the shadows he had seen in the cave were only his perception of objects with depth and details that he couldn’t have fathomed in his old life in the cave. This access to new information would allow the prisoner to view the world and ask questions in ways that were inconceivable before he stepped into the light. The new information would allow the prisoner to explore and hypothesize and intuit truths about the world around him. Socrates concluded that such a person would follow in his and Plato’s footsteps and become a philosopher.
24 seconds ago…
Over 26 terabytes of data were created, which is a truly astounding amount of data. And every second since then, another 26 terabytes of data have been created. It’s estimated that 2.5 quintillion bytes of data (2.5 million terabytes or 2.5 trillion copies of Plato’s Republic) are created each and every day. With this volume of data being produced daily, trying to consume, examine, and evaluate all that data is a daunting and disorienting proposition. It’s unclear where to even begin with the possibilities that big data provides. However, there is clearly value to be gained somewhere in that mountain of data, so it’s no wonder that some enterprising individuals have tried to sift through the depth and details of the data to identify patterns and insights about the world around them. By using techniques with foundations in statistics and computer science, these types of individuals would become data scientists.
Admittedly, the comparison might sound high-falutin (don’t all data science articles start with references to ancient philosophers?), but it has merit. At their core, both philosophers and data scientists try to take the messy world around them and make sense of it. But rather than attempt to start with the data and uncover what they can, both philosophers and data scientists start with questions and try to use the information available to answer those questions.
Philosophers use what they can see to try to understand why things are the way they are. Data scientists use data to try to understand how different data points relate. Philosophers ask questions about the world around them and try to intuit answers. Data scientists on the other hand are less interested in the metaphysical. Rather than focusing on heady questions about huge topics like Existence, data scientists trade questions like “Are physical objects merely imperfect recreations of non-physical essences of things?” for problems like “How can I classify these different data points into similar groups?” Rather than relying solely on logic and intuition to come to their conclusions, data scientists also use statistical techniques, methodologies, and best practices to demonstrate that the results of their models are meaningful and statistically significant, trading “Cogito ergo sum” for sums of squared errors.
Another similarity between philosophers and data scientists is in their approach to answering these questions. Philosophers might employ the Socratic method to question all their assumptions to ensure that their logic holds up to scrutiny. Similarly, data scientists scrutinize their models by using hold out datasets to confirm that their findings hold when applied to new data.
Lastly, philosophers and data scientists share similar attitudes in that they have a healthy skepticism about whether these questions can even be answered. Philosophers were driven by the pursuit of knowledge rather than finding a specific answer. Data scientists are no different in this regard. As famed statistician George Box once said, “All models are wrong, but some are useful.” So, data scientists aren’t focused on making the perfect model (it doesn’t exist) but instead are trying to develop techniques and insights that allow them to interpret the world of data around them in a way that makes just a little more sense.
Today…
At Arity, we’re collecting and compiling billions of miles of driving data each and every day. Our data scientists ask questions and dive into the data to try to find answers. But our data scientists can’t possibly ask every question that could be addressed with the volume of data that we have available, so we turn to our business partners and customers to understand what questions and problems they’re trying to solve.
Take for example our partnership with GasBuddy, which operates a mobile app focused on finding real-time fuel prices at gas stations so that their app users can efficiently fill up their tanks. GasBuddy was focused on increasing user engagement for their mobile app, so they turned to Arity to partner on developing a way to drive user engagement. To come up with a proposed solution to this problem, something like the dialog below took place:
Pupil: How can we drive app engagement for GasBuddy users?
Socrates: In order to drive engagement, we need to understand what is important to GasBuddy users. So, what do we know about GasBuddy users? Is there something unique that their users tend to have in common?
Pupil: GasBuddy users, by definition, have all downloaded the GasBuddy app, so they clearly aren’t like a random sample of the population.
Socrates: Good. What else might GasBuddy users have in common? Why did they download the app in the first place?
Pupil: They are all likely interested in gas prices.
Socrates: That makes sense. Why are GasBuddy users interested in gas prices though?
Pupil: They probably downloaded the app with the intention of using that information to save money at the pumps.
Socrates: That seems reasonable. What feature could we create that would enable GasBuddy users to save money on gas?
Pupil: Beyond compiling gas prices from individual stations, which GasBuddy already does, a different way to save money on gas would be to reduce gas consumption of their app users. Could we create a fuel efficiency model? What data would we need to create such a model? Does Arity currently collect data that could be used to develop a fuel efficiency model?…
Using this type of dialog and back-and-forth questioning, Arity and GasBuddy were able to tie a business problem (How to drive app engagement?) with a potential solution (a fuel efficiency model) leveraging available data. After this process of inquiry, Arity was able to look at the gobs of driving data it already had through the lens of predicting fuel efficiency. With some development and implementation work and continued partnership with our stakeholder, Arity was able to create and implement the fuel efficiency model that is now available for all GasBuddy users in their app.
Fuel efficiency is just one of the ways that Arity is seeking to provide value for its customers. Other questions that Arity is trying to answer are:
Are you curious to help businesses reach more customers, more efficiently?
Imagine helping marketers and advertisers to identify and target customers based on attributes rarely available in years past. Our analytics team works to improve things like:
- An auto-repair shop targeting customers based on their likelihood of being in a particular type of collision.
- A quick-service restaurant targeting customers with a special deal on breakfast, knowing they’re likely to be passing by on their morning commute.
Are you looking to create rich experiences that offer unique insights to users and drive app engagement?
You could join our team that focuses on:
- Analyzing incentive programs based on their mobility behaviors, in effect gamifying their driving
- Giving users the ability to better understand their own driving patterns through data dashboards, highlighting habits they might not even be aware of themselves
- Helping prospective home buyers understand the safety of the streets near their house before making a purchase
Are you wishing to transform the insurance industry to provide our customers fair insurance premium truly based on their driving behavior?
Based on the vast amounts of data we collect, Arity has the ability to help auto insurers:
- Precisely quantify the driving risk of each driver, which is reliable and more intuitively related with insurance risk than the demographic attributes used by most insurance rating algorithms
- Potentially unlock the key benefit of assistance in claims filing with technologies like collision reconstruction, which fundamentally changes the way we handle claims in insurance
Here at Arity, we are customer focused, and we’re continually looking for ways to strengthen our customer relationships. Imagine knowing not just when a customer is in a time of need, but what type of support you can offer during these significant events in their lives. That could be in the moments following a collision or a flat tire. It could even be a life-saving difference. In the future, it could also mean avoiding that moment of need altogether, like giving pedestrians and joggers information on the safest routes to avoid risky drivers.
Other than interesting business problems and values we could deliver to our customer, Arity fully realizes the demanding technology requirement for developing data science products and therefore Arity has built a powerful, reliable, and flexible analytics environment for data scientists.
We have fully adopted AWS EMR cluster, a cloud big data platform, for running large-scale distributed data processing jobs and machine learning (ML) applications. Whatever programming languages you are comfortable (such as Python, R, Spark, Scala, Java), or whatever IDE you prefer (Jupyter Notebook, R Studio, etc.), we have them all supported on the EMR cluster. As for ML model deployment, we have streamlined the different processes for batch, streaming, and on-the-edge model deployment, using state-of-the-art tools like Airflow, Flink, Tensorflow, MLflow, and so on.
We have also built rigorous framework to set up post-deployment monitoring and lifecycle management for all our models. Arity’s full suite of technology solutions enables data scientists to focus on our core business value and meanwhile provide enough flexibility for data scientists to innovate and experiment new ideas.
These are just some of the questions and tools that Arity data scientists work with each day. Arity uses its data to solve all sorts of problems from advertising, car insurance, and auto manufacturing and repair. With that said, there is just one more important question to ask: are you interested in leaving the cave and asking and answering interesting questions using Arity’s data?
If so, check out our Careers page or reach out to one of us directly: Brian Faber or Yicheng Wen.