The pace of data science adoption by businesses is speeding up. Increasingly, brands are applying graph-powered analytics to connected data to discover insights and power innovation.
As more businesses realise the value of connections, graph-powered data science will be an important part of the enterprise data scientist’s toolbox over the next decade. One challenge many teams encounter is understanding how to operationalise graphs in their existing ML practices. Adopters of graph data science report that machine learning is much more rapidly deployed with purpose built graph technology.
Graph data science is proving highly successful in a number of contexts. It leverages the connections between data points for more accurate and interpretable predictions. As a result, it’s used to power strategic predictions to help data scientists answer exacting business questions and explain outcomes. It is an inherently powerful technique that can reason about the ‘shape’ of connected context for each piece of data through graph algorithms. As a result, graph data science enables much richer machine learning predictions than we’ve seen before.
Analysts agree. In its 2020 report on data and analytics tech trends, Gartner predicts that “Finding relationships in combinations of diverse data, using graph techniques at scale, will form the foundation of modern data and analytics.” It also found a remarkably high 92% said they plan to employ graph technology within five years.
Leveraging connected data in machine learning
Graph technology has already made contributions to many use cases, from fraud detection to tracking a customer journey. Graph technology is able to achieve this by leveraging the connections between data points for more accurate and interpretable predictions. In a drug discovery scenario, this means identifying new associations between genes, diseases, drugs and proteins. This data provides context to assess the relevance or validity of any such discovery. For customer recommendations, application of graph technology means learning from user journeys to make accurate, data-driven recommendations to customers. These graph-enriched recommendations are for future purchases and presenting options drawn from previous buying history.
Most data science teams in the corporate world are still learning how to leverage connected data in their machine learning work. But this is starting to change. One example of graph data science in action is at a US media company called Meredith Corporation. Meredith has a digital presence reaching 180 million users monthly across multimedia platforms through 40+ publications, from Better Homes & Gardens to Allrecipes. The firm is dedicated to
giving users specific and personalised experiences, delivering the right content at the right time.
Entity resolution enables Meredith to create descriptive audience segments leading to personalised experiences and boosting traffic to their media properties. A community detection algorithm means it’s able to disambiguate anonymous traffic on its media properties, turning 14 billion anonymous users into 163 million unique user profiles. As a result, it’s created a better user experience. This has led to a remarkable 612% increase in visits per profile.
Now, Meredith is adding tens of millions of customer data points to its 40 billion-node graph. Per day! These results are used internally to inform business decisions and externally by advertisers for more accurate targeting. As its Senior Data Scientist at Meredith Corporation comments: “We use the graph algorithms to transform billions of page views into millions of pseudonymous identifiers with rich browsing profiles, which translates into significant revenue gains and better-served consumers.”
Representing complex space and time data
Another example is how New York-Presbyterian Hospital uses graph technology to better track and contain in-hospital infections. Scientists at the hospital found that graphs were the best way to connect all the dimensions of an infection ‘event’—the what, when, and where it happened—and prevent future outbreaks.
The team created a ‘time’ and ‘space’ tree to model all the treatment rooms in the hospital. This surfaced a large number of previously hidden connections. “It was important to us to be able to look at events across both time and space,” its Analytics Project Leader, Michael Zelenetz, has said. “We wanted to ask questions like, ‘Who got an infection during their stay?’ and ‘Who was sharing a room with them at the time they contracted that infection?’ The problem we kept running into was that we wanted to know what happened to patients during their visit and where they were at the time that a certain event happened.”
Previously, New York-Presbyterian Hospital had difficulty modeling these scenarios because it had no way to capture the space and time aspects of the problem. This was an issue as effective tracking of infections requires such data. Once a new ‘event’ entity was included to connect the time and location trees, the resulting graph-enriched data model empowers the hospital to identify and contain an outbreak before it spreads. The analytics team says it is positioned to better analyse numerous other kinds of hospital events, from tests to surgeries.
Such interesting use cases suggest that now is a perfect time to evaluate the proven potential of graph data science for your data-driven analytics projects. It’s a great way to power up your organisation’s data and analytics capabilities. It also opens up the possibility of much more understandable machine learning.
The author is Senior Director, Graph Data Science, at the world’s leading graph database, Neo4j