Knowledge Graphs Created through Basic Machine Learning
A Talk by Clair Sullivan (Graph Data Science Advocate, Neo4j)
About this Talk
Description
Knowledge graphs are all around us. Chances are you use at least one once per day. They are used to solve problems ranging from question answering to search to recommendation engines. However, creating them can be a challenge.
The goal of this masterclass is to take participants from raw, unstructured text into the creation of a full knowledge graph. We will use basic tooling in Python and natural language processing (NLP) techniques to take the text and extract the necessary subjects, verbs, and objects to create the triples that can be used to create the graph. From here, we will populate a graph database for introductory queries such as question answering.
Once the participants are comfortable with basic graph querying and traversals, we will expand into graph data science and machine learning on graphs. We will explore techniques around creating graph embeddings as the entry point for binary classification problems for node classification.
This masterclass is targeted towards data scientists and machine learning engineers of all levels. Participants will learn how to use basic natural language processing (NLP) to construct a small knowledge graph using Python and a graph database. From there they will learn how to query the graph and conduct machine learning on it.
The end result will be that the participants will be able to go from raw, unformatted text to a completely functional knowledge graph that they will have analyzed and on which they will have performed node classification using common data science techniques and tooling.
Key Topics
-
Natural language processing of unstructured text
-
Populating a graph based on the output of NLP while querying other data sources on the internet
-
Graph queries and traversals using Cypher
-
Graph embedding techniques
-
Graph-based machine learning
Target Audience
- Data Scientists and Machine Learning Engineers
- Data Engineering
- Machine Learning Engineers
- Data Analysts
- Managers of the above
Goal
Get hands-on experience using NLP to create a knowledge graph from scratch and analyze it using graph analytics and graph-based machine learning
Session Outline
- Brief introduction to the type of data that can be used to create a knowledge graph
- Word co-occurrence
- RDF
- Subject-verb-object (SVO) triples
- Introduction to natural language processing as it applies to knowledge graphs
- Method 1 for the creation of a knowledge graph: a full treatment of SVO triplet detection
- Method 2 for the creation of a knowledge graph: combining basic NLP with online data sources
- Methods to do graph-based machine learning
Format
This class is very hands-on.
The beginning of the class will start in a lecture format, but will quickly move to hands-on coding exercises.
We will be working with Jupyter or Google Colab notebooks (the participant’s choice) with standard Python packages to scrape the Google Knowledge Graph, Wikipedia, and Wikidata to create knowledge graphs using two different approaches. The graphs themselves will be created in Neo4j using free Sandbox instances. Queries will be demonstrated using the Cypher Query Language, both from within the Neo4j web browser as well as from Python in the notebook environment.
Once we progress to graph-enabled machine learning, we will take advantage of the Graph Data Science Library built into Neo4j to create graph embeddings that will be used for traditional machine learning tasks. These tasks will be executed both within Neo4j as well as Python using scikit-learn.
Level
Beginner - Intermediate
Prerequisite Knowledge
Basic Python and familiarity with standard packages (pandas, numpy, scikit-learn)
You need an access pass to attend this session: Diversity Access Pass or Full Access Pass apply