Have you purchased yours?

Paco Nathan

Graph Data Science

A Talk by Paco Nathan (Managing Partner, Derwen Inc.)

Proudly supported by

About this Talk

Description

Python has excellent libraries for working with graphs which provide: semantic technologies, graph queries, interactive visualizations, graph algorithms, probabilistic graph inference, as well as embedding and other integrations with deep learning.

However, almost none of these have integration paths other than writing lots of custom code, and most do not share common file formats. Moreover, few of these libraries integrate effectively with popular data science tools (e.g., pandas, scikit-learn, PyTorch, spaCy, etc.) or with popular infrastructure for scale-out (Apache Spark, Ray, RAPIDS, Apache Parquet, fsspec, etc.) on cloud computing.

This tutorial introduces kglab – an open source project that integrates RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, pslpython, node2vec, PyVis, and more – to show how to use a wide range of graph-based approaches, blending smoothly into data science workflows, and working efficiently with popular data engineering practices.

The material emphasizes hands-on coding examples which you can reuse; best practices for integrating and leveraging other useful libraries; history and bibliography (e.g., links to primary sources); accessible, detailed API documentation; a detailed glossary of terminology; plus links to many helpful resources, such as online 'playgrounds" – meanwhile, keeping a practical focus on use cases..

The coding exercises in the following tutorial are based on progressive examples based on cooking recipes, which illustrate the use of kglab and related libraries in Python for graph data science. Moreover, in addition to the hands-on use of open source, we'll illustrate at several points about graph thinking: a cognitive framework for approaching AI problems with graph technologies.

Key Topics

  • Hands-on experience with popular open source libraries in Python for building KGs, including rdflib, pyshacl, networkx, owlrl, pslpython, and more
  • Coding examples that can be used as starting points for your own KG projects
  • How to blend different graph-based approaches within a data science workflow to complement each other’s strengths: for data quality checks, inference, human-in-the-loop, etc.
  • Integrating with popular data science tools, such as pandas, scikit-learn, matplotlib, etc.
  • Graph-based practices that fit well with Big Data tools such as Spark, Parquet, Ray, RAPIDS, and so on
  • Overall, how to apply graph thinking for problem solving.

Target Audience

  • Python developers who need to work with KGs
  • Data Scientists, Data Engineers, Machine Learning Engineers
  • Technical Leaders who want hands-on KG implementation experience
  • Executives working on data strategy who need to learn about KG capabilities
  • People interested in developing personal knowledge graphs

Goals

  • The overall goals for this course are to give each participant hands-on experience using a wide range of Python open source libraries which enable graph data science practices.
  • This is provided in the context of an expert practitioner, who can help answer questions based on experience with industry use cases.
  • Moreover, we'll emphasize the "graph thinking" approach to problem solving, which is essential for these practices.

Session outline:

  • Sources for data and controlled vocabularies: using a progressive example based on a Kaggle dataset for food/recipes
  • KG Construction in rdflib and Serialization in TTL, JSON-LD, Parquet, etc.
  • Transformations between RDF graphs and algebraic objects
  • Interactive Visualization with PyVis
  • Querying with SPARQL, with results in pandas
  • Graph-based validation with SHACL constraint rules
  • A sampler of graph algorithms in networkx and igraph
  • Inference based on semantic closures: RDFS, OWL-RL, SKOS
  • Inference and data quality checks based on probabilistic soe logic
  • Embedding (deep learning) for data preparation and KG construction

Format

  • This class will be a mix of lecture/slides plus live coding examples, along with much Q&A.
  • All of the material is available through public GitHub repository: https://github.com/DerwenAI/kglab/
  • That has detailed instructions for installing the kglab library and gehng started. Alternatively there is a Docker Compose container image available.
  • Each participant is expected to download and install this repository. Then all of the course materials will be available locally as Jupyter notebooks.

Level

Beginner to Intermediate

Prerequisite Knowledge

  • Some coding experience in Python (you can read a 20-line program)
  • Interest in use cases that require knowledge graph representation
  • Additionally, if you've completed Algebra 2 in secondary school and have some business experience working with data analytics – both can come in handy.

You need an access pass to attend this session: Diversity Access Pass or Full Access Pass apply

01 December 2021, 06:00 PM

06:00 PM - 08:00 PM

About The Speakers

Paco Nathan

Paco Nathan

Managing Partner, Derwen Inc.

Known as a "player/coach", with core expertise in data science, cloud computing, natural language, graph technologies; ~40 years tech industry experience, ranging from Bell Labs to early-stage start-ups.