What are Graphs?
Talk to a scientist in just about any discipline, and ask them the question — based on their discipline — “how does that stuff work?” You’ll likely find that there are systems and networks that you have to consider before you can really understand how any given thing works: whether that’s the human body, a food chain in an ecosystem, a chemical reaction, or a society as a whole. Without understanding the relationship between two animals in an ecosystem, two atoms in a molecule, or cells and tissues in our body, you just have a bunch of data: a list of cells, a readout of animals and what they eat, etc.
Traditional machine learning models often take data this way: they take lists or tables of data and do some stuff (the details of which depend on the algorithm being used as well as a few other parameters) to make predictions about a thing. But for some problems, there’s a better way.
If this in-depth educational content is useful for you, subscribe to our AI research mailing list to be alerted when we release new material.
Graphs are data structures that represent networks of, or relationships between the data they contain. Typically, they’re represented as “nodes” and lines, or “edges”.
Figure 1, above is an example of an “undirected graph”, or, a graph in which data has a two-way relationship with other data. This means that each node in our graph, each representing a person in a social network, is connected with each other node. This might represent a networking site like LinkedIn, where once you’re connected with someone you’re connected with each other. An example of a directed social network graph might be Twitter, where users follow each other (and not always mutually).
The images which represent data (users) are called nodes or vertices, and the lines which connect them are called edges. These lines represent relationships between the vertices, and can be represented, as an “all or nothing relationship” (i.e: you’re following someone or you aren’t) or as a “weighted” relationship (i.e: a solid line can represent higher engagement between two users, while a dotted line can represent a weaker relationship or lower engagement).
At this point, it’s possible you’re feeling how I felt when I first was introduced to graphs and graph theory in a computer science class: bored and possibly slightly confused. The good news is, since we’ve covered some of the terminology that’s necessary to understand the good stuff, we can start to get into why graphs matter, and what makes them so cool.
So, what?
Graphs are already used for some pretty neat stuff in computer science: your Maps application, for example, is using graphs behind the scenes to store data about locations and the streets that connect them, and is using shortest-distance algorithms to find you the shortest route to your destination.
But it gets even better when we start to look at using graphs for machine learning. Because graphs demonstrate comprehensive relationships between pieces of data (as compared to ordered lists of data, or tensors which tell us little about the relationships between data points or features by themselves), we can use them to perform in-depth analysis and make predictions based on these relationships.
Graph Theory & Machine Learning — But How?
Before we get to reap the benefits of combining these graphs or networks we keep talking about with machine learning, we have to somehow represent our graph in a way that a computer — and then a machine learning algorithm — can understand.
Graphs can be represented traditionally in one of three basic ways:
- An Adjacency Matrix
Adjacency matrices do… kind of just what they sound like they’d do. They represent connections, or edges, between different nodes using a matrix. We can look at an example to illustrate what this might look like:
Here, we can find an edge between two notes by finding where their labels intersect in our matrix. We can see, for example, that node 1 is not connected to itself, but that it is connected to node 2.
2. An Edge List
An edge list is another way to represent our network — or graph — in a way that’s computationally understandable. Here, we represent pairs of connected nodes within a list. You can see an example below:
3. An Adjacency List
Adjacency lists combine the above two approaches, representing a list of nodes, connected to a list of all of the nodes they’re directly connected to. To illustrate, let’s look at an example:
With the above three approaches, we’re able to tackle the difficulty of representing graphs computationally in our code. However, there are still some challenges when feeding graphs to machine learning models. Traditionally, deep learning models are good at handling data which takes up a fixed amount of space and is unidirectional. No matter how we represent them, graphs don’t take up a fixed amount of space in memory, and aren’t continuous, but rather, each node holds a reference to nodes it’s directly connected to.
There are some really fabulous ways of tackling these challenges, which I’ll dive deeper into in a later article. For now, I’ll leave you with a few resources to research on your own should you be interested, which are providing us with new ways to expand the problems machine learning is able to solve.
- Learn about Deep Learning on Graphs with Graph Convolutional Networks
- Learn about DeepWalk — a project finding another way to represent graphs for machine learning
Graph Theory and Machine Learning — What Can we Do With It?
Nothing exists in a vacuum, and understanding the interconnected networks of data that make up many of our scientific disciplines provides the exciting potential to answer so many questions — more than I can begin to wrap into this article.
What if we could better understand the human brain? What if we could make predictions about the effect of some stimulus or change on an ecosystem? Or, predict which compound is the most likely to create an effective drug?
The best part of everything we’ve just learned is that we can — and it isn’t simply a theoretical possibility, but something we’re doing right now!
Graph theory is already being used for:
- Diagnostic modeling (predicting to a certain degree of certainty whether or not a patient has a specific diagnosis).
- Helping with the diagnoses and treatment of patients with cancer.
- Developing pharmaceuticals (medications).
- Seeking to develop a theoretical synthesis between the theories of ecology and evolution.
How Graph Theory Makes it All Happen
Let’s dive a little deeper into these applications, so we can look at the utilization of graph theory within them in more detail.
Let’s use diagnostic models as an example. Specifically, I want to look at this example of network analysis being used for the diagnosis and identification of possible schizophrenia in patients: Graph-based network analysis in schizophrenia.
Using graphs to represent network analyses of the brain, neuroscientists are able to map key findings related to the diagnosis of schizophrenia. Given that there are certain markers for the onset of schizophrenia:
- less efficiently wired networks
- less local clustering
- less hierarchical organization
We could potentially evaluate these networks with a machine learning algorithm and predict the probability a patient has or will develop schizophrenia based on these markers.
Without the knowledge of these networks, in this example, this kind of analyses becomes an entirely different neurological analysis of the patient. The promising discoveries of these findings for schizophrenia has promising implications for the diagnosis and treatment of this disorder — possible early diagnosis and intervention that goes far beyond simply evaluating symptoms.
This is just an example, but is entirely illustrative of the benefits of graph theory in machine learning as it intersects with other disciplines.
The fact of the matter is, there is often much more to our data than we can represent in lists, data frames, or tensors alone. While there are ways to explore our data and present it in such a way that we can hypothesize relationships and even enable our algorithms to predict these, when we’re able to represent the connections between our data in a different way, we’re able to do more with the data that we have.
When we understand the ways in which things relate to one another, we understand them better: we can make more comprehensive predictions, and answer harder questions, with some pretty life changing results.
Want to learn more? Check out the rest of the series:
- Graph Convolutional Networks — Explained
- What Makes Graph Convolutional Networks Work?
- Graph Convolutional Network Implementation With the PROTEINS Benchmark Dataset
Attributions:
- Everything You Need to Know About Graph Theory and Machine Learning
- Graph Theory & Machine Learning in Neuroscience
- Knowing Your Neighbours — Machine Learning on Graphs
- Network-based machine learning and graph theory algorithms for precision oncology
This article was originally published on Towards Data Science and re-published to TOPBOTS with permission from the author.
Enjoy this article? Sign up for more AI updates.
We’ll let you know when we release more technical education.
Leave a Reply
You must be logged in to post a comment.