-
Notifications
You must be signed in to change notification settings - Fork 153
SciPy 2019 Tutorial on Multi dimensional Linked Data Exploration with Glue
- Where/when/who
- Target audience
- Background
- Installing glue
- Updating glue
- Downloading the required data
- Testing out your installation
- Tutorial Schedule
- Running notes/questions
- Scripts
- Exploration time
- Jupyter notebooks
- More data
A tutorial on glue, a Python application and library for Multi-dimensional Linked Data Exploration will take place on 9th July at SciPy 2019 (from 1:30pm to 5:30pm in Room 101).
The tutorial will be led by Thomas Robitaille - for any questions ahead of the tutorial, you can email [email protected].
The intended audience are participants from any discipline that are interested in learning more about glue and how to use it to explore data, as well as how to customize it. No Python experience is required for the parts related to learning about the Qt graphical user interface, and basic Python knowledge will be needed for participants who want to customize the Qt application or try out the Jupyter/web version. Participants of all levels of Python expertise are welcome!
Glue is a Python application and library to explore relationships within and among related datasets. Its main features include:
-
Linked Statistical Graphics. With Glue, users can create scatter plots, histograms and images (2D and 3D) of their data. Glue is focused on the brushing and linking paradigm, where selections in any graph propagate to all others.
-
Flexible linking across data. Glue uses the logical links that exist between different data sets to overlay visualizations of different data, and to propagate selections across data sets. These links are specified by the user, and are arbitrarily flexible.
-
Full scripting capability. Glue is written in Python, and built on top of its standard scientific libraries (i.e., Numpy, Matplotlib, Scipy). Users can easily integrate their own python code for data input, cleaning, and analysis. Glue can now be used either as a Qt application, or inside Jupyter notebooks and Jupyter Lab.
You can also see glue in action in this short video.
Glue works on MacOS X, Linux, and Windows, and supports Python 2.7, 3.6, and 3.7. For complete installation instructions, see the glue documentation. Below we provide a summary of the most common methods for installing glue. If you have never used Python before, we recommend downloading the Miniconda distribution then following the steps for If you use conda below.
If you are using conda, for this tutorial we recommend that you create a new environment to install glue and its dependencies. First, make sure your version of conda is up to date:
conda update -n root conda
then create the environment and install glue with:
conda create -n scipy2019-glue -c glueviz python=3.7 glueviz
This will ensure that you have the latest versions of all dependencies. To switch to this environment, you can use:
conda activate scipy2019-glue
If you find that conda is slow to install all dependencies, you can also create an empty environment and pip install glue into it:
conda create -n scipy2019-glue -c glueviz python=3.7
conda activate scipy2019-glue
pip install PyQt5 glueviz
If you don't use conda, we recommend installing glue using pip:
pip install PyQt5 glueviz
If you already have glue installed in a conda environment and want to update it, you should be able to do this with:
conda install -c glueviz glueviz=0.15
and if you are using pip, you can instead do:
pip install glueviz --upgrade
To get the data required for this tutorial, you can either download this file or you can clone the glue data repository:
git clone https://github.com/glue-viz/glue-example-data.git
To check if glue is correctly installed, go inside the data directory you downloaded in the previous step, and type:
glue Planes/boston_planes_48h.csv
on the command-line, and the glue application should open and show the boston_planes_48h
dataset in the top left:
If you run into any installation issues, you can email the main tutorial organizer at [email protected] or open an issue in this repository.
Note: the times below are approximate!
The tutorial will kick off with an overview of the main concepts behind glue and the key functionality in the Qt/desktop and Jupyter/web versions of glue. This will be an informal high-level presentation with time for participants to interrupt and ask questions.
This will be a hands-on session where participants will use example data provided and will load it into the Qt/desktop version of glue. They will learn how to make various types of visualizations interactively, as well as how to make selections in the data and see those selections propagate between visualizations. The tutorial will cover the basics of linking different datasets, and subsequently see how selections can propagate from one dataset to another.
This will be a hands-on session looking at how participants can write simple Python functions to add functionality to glue - for example a custom data loader, or a custom tool accessible via the menu bar - without any knowledge of how to develop GUI code.
In this hands-on session, participants will be encouraged to spend time using glue to explore their own data, writing custom data loaders if needed. We will also provide a selection of different datasets for users that don’t have their own data handy to try out.
In this last hands-on session, we will go through the steps of getting started with glue in Jupyter Notebook and Lab, using the same data as in the Qt tutorial. If time permits, participants will be encouraged to also try out their own data in this version of glue.
Participants can take notes about issues or ask questions in this Google Doc.
The detailed plan for the instructors can be found here - feel free to take a look if you missed a particular command or step!
from glue.core import DataCollection
from glue.core.link_helpers import LinkSame
from glue.core.data_factories import load_data
from glue.app.qt.application import GlueApplication
from glue.viewers.scatter.qt.data_viewer import ScatterViewer
from glue.viewers.image.qt.data_viewer import ImageViewer
image = load_data('w5.fits')
catalog = load_data('w5_psc.csv')
dc = DataCollection([image, catalog])
dc.add_link(LinkSame(image.id['Right Ascension'], catalog.id['RAJ2000']))
dc.add_link(LinkSame(image.id['Declination'], catalog.id['DEJ2000']))
app = GlueApplication(dc)
image_viewer = app.new_data_viewer(ImageViewer)
image_viewer.add_data(image)
image_viewer.add_data(catalog)
image_viewer.viewer_size = (500, 500)
image_viewer.state.layers[0].percentile = 99
scatter_viewer = app.new_data_viewer(ScatterViewer)
scatter_viewer.add_data(catalog)
scatter_viewer.viewer_size = (500, 500)
scatter_viewer.position = (500, 0)
scatter_viewer.state.x_att = catalog.id['[4.5]-[5.8]']
scatter_viewer.state.y_att = catalog.id['[5.8]-[8.0]']
app.start()
from glue.config import menubar_plugin
@menubar_plugin("Make all data orange")
def my_plugin(session, data_collection):
for data in data_collection:
data.style.color = 'orange'
import numpy as np
from pandas import read_csv
from glue.config import data_factory
from glue.core import Data
def is_planes_dataset(filename, **kwargs):
return 'planes' in filename
@data_factory('Plane data reader', priority=10000,
identifier=is_planes_dataset)
def read_plane_data(filename):
df = read_csv(filename)
data = Data()
for column in df.columns:
data[column] = df[column]
data['distance'] = np.hypot(data['x'], data['y'])
return data
The tutorial will include time for you to explore glue, for example to try and use it on your own data.
If you are interested in trying out some of the available plugins, including domain-specific plugins, take a look at the list of available plugins which includes easy installation instructions for the plugins.
If you don't have data to read in to glue available, or need ideas of things to look at, take a look at the data inside the Taxis
directory in glue-example-data. This contains a record of all taxi trips in New York over the course of a month and a satellite image of New York. Try installing the glue-geospatial plugin (see the list of plugins above), and then load in both datasets, and try and link them by longitude/latitude. You should be able to overlay the taxi data on the satellite image! Then have fun exploring the data. If you don't see the Taxis
folder, make sure you have the latest version of the glue-example-data directory (see here for more details)
You can open the required Jupyter notebooks using the following links:
In addition to the data in the glue-example-data, you may want to try out some of the following larger datasets: