-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update prompt for text summarization notebook (#173)
This notebook is failing because the title of the wikipedia page it uses for the prompt has changed from "Queen (band)" to "Queen(band)". This PR makes this update in the relevant cell.
- Loading branch information
1 parent
8556f58
commit 9c47d74
Showing
1 changed file
with
76 additions
and
76 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,13 +2,14 @@ | |
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Copyright (c) 2023 Graphcore Ltd. All rights reserved." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Text Summarization on IPUs using BART-L - Inference\n", | ||
"\n", | ||
|
@@ -18,11 +19,11 @@ | |
"| Domain | Tasks | Model | Datasets | Workflow | Number of IPUs | Execution time |\n", | ||
"|---------|-------|-------|----------|----------|--------------|--------------|\n", | ||
"| NLP | Text summarization | BART-L | - | Inference | Recommended: 2 | 5 min |\n" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Environment setup\n", | ||
"\n", | ||
|
@@ -31,31 +32,31 @@ | |
"[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://ipu.dev/49bCUB)\n", | ||
"\n", | ||
"To run the demo using other IPU hardware, you need to have the Poplar SDK enabled and a PopTorch wheel installed. Refer to the [Getting Started guide for your system](https://docs.graphcore.ai/en/latest/getting-started.html) for details on how to do this. Also refer to the Jupyter Quick Start guide for how to set up Jupyter to be able to run this notebook on a remote IPU machine." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Requirements\n", | ||
"\n", | ||
"Before running the model on IPUs you have to install the Python dependencies:" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install optimum-graphcore==0.7.1 wikipedia graphcore-cloud-tools[logger]@git+https://github.com/graphcore/[email protected]\n", | ||
"\n", | ||
"%load_ext graphcore_cloud_tools.notebook_logging.gc_logger" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In order to improve usability and support for future users, Graphcore would like to collect information about the applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:\n", | ||
"\n", | ||
|
@@ -64,32 +65,33 @@ | |
"- Environment details\n", | ||
"\n", | ||
"You can disable logging at any time by running `%unload_ext graphcore_cloud_tools.notebook_logging.gc_logger` from any cell." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"\n", | ||
"exec_cache_dir = os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"/tmp/exe_cache/\")" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Model preparation\n", | ||
"\n", | ||
"We start by preparing the model. First, we define the configuration needed to run the model on the IPU. `IPUConfig` is a class that specifies attributes and configuration parameters to compile and put the model on the device:" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from optimum.graphcore import IPUConfig\n", | ||
"\n", | ||
|
@@ -104,20 +106,20 @@ | |
" \"on_device_generation_steps\": 16,\n", | ||
" }\n", | ||
")" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Next, let's import `pipeline` from `optimum.graphcore` and create our summarization pipeline:" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from optimum.graphcore import pipeline\n", | ||
"\n", | ||
|
@@ -130,145 +132,145 @@ | |
" max_input_length=1024,\n", | ||
" truncation=True\n", | ||
")" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We define an input to test the model." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"input_test = 'In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language). The name \"compiler\" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language (e.g. assembly language, object code, or machine code) to create an executable program.'\n", | ||
"input_test" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Compilation time for the 1st run: ~ 2:30" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"summarizer(input_test, max_length=150, num_beams=3)" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## A fairy tale long story short..." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The first call to the pipeline was a bit slow, it took several seconds to provide the answer. This behaviour is due to compilation of the model which happens on the first call.\n", | ||
"On subsequent prompts it is much faster:" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"the_princess_and_the_pea = 'Once upon a time there was a prince who wanted to marry a princess; but she would have to be a real princess. He travelled all over the world to find one, but nowhere could he get what he wanted. There were princesses enough, but it was difficult to find out whether they were real ones. There was always something about them that was not as it should be. So he came home again and was sad, for he would have liked very much to have a real princess. One evening a terrible storm came on; there was thunder and lightning, and the rain poured down in torrents. Suddenly a knocking was heard at the city gate, and the old king went to open it. It was a princess standing out there in front of the gate. But, good gracious! what a sight the rain and the wind had made her look. The water ran down from her hair and clothes; it ran down into the toes of her shoes and out again at the heels. And yet she said that she was a real princess. Well, we\\'ll soon find that out, thought the old queen. But she said nothing, went into the bed-room, took all the bedding off the bedstead, and laid a pea on the bottom; then she took twenty mattresses and laid them on the pea, and then twenty eider-down beds on top of the mattresses. On this the princess had to lie all night. In the morning she was asked how she had slept. \"Oh, very badly!\" said she. \"I have scarcely closed my eyes all night. Heaven only knows what was in the bed, but I was lying on something hard, so that I am black and blue all over my body. It\\'s horrible!\" Now they knew that she was a real princess because she had felt the pea right through the twenty mattresses and the twenty eider-down beds. Nobody but a real princess could be as sensitive as that. So the prince took her for his wife, for now he knew that he had a real princess; and the pea was put in the museum, where it may still be seen, if no one has stolen it. There, that is a true story.'\n", | ||
"the_princess_and_the_pea" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"summarizer(the_princess_and_the_pea, max_length=150, num_beams=3)" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Summarization of Wikipedia articles\n", | ||
"Now let's use the Wikipedia API to search for some long text that can be summarized:" | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import wikipedia\n", | ||
"\n", | ||
"# TRY IT YOURSELF BY CHANGING THE PAGE TITLE BELOW\n", | ||
"page_title = \"Queen (band)\"\n", | ||
"page_title = \"Queen(band)\"\n", | ||
"text = wikipedia.page(page_title).content\n", | ||
"text" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"summarizer(\n", | ||
" text, # NOTE: the input text would be truncated to max_input_length=1024\n", | ||
" max_length=150,\n", | ||
" num_beams=3,\n", | ||
")" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Summarization of medical health records\n", | ||
"The summarization task may be also useful in summarising medical health records (MHR). Let's import an open source dataset with some medical samples." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from datasets import load_dataset\n", | ||
"\n", | ||
"dataset = load_dataset(\"rungalileo/medical_transcription_40\")\n", | ||
"dataset" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We focus on the medical report labeled as \"text\" and from the training dataset select a random patient ID." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import random\n", | ||
"\n", | ||
|
@@ -277,48 +279,46 @@ | |
"\n", | ||
"exemplary_medical_report = dataset[\"train\"][random_patient_id][\"text\"]\n", | ||
"exemplary_medical_report" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"summarizer(exemplary_medical_report, max_length=150, num_beams=3)" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Optional - Release IPUs in use\n", | ||
"\n", | ||
"The IPython kernel has a lock on the IPUs used to run the model, preventing other users from using them. For example, if you wish to use other notebooks after working your way through this one, it may be necessary to manually run the below cell to release IPUs from use. This will happen by default if you use the \"Run All\" option. More information on the topic can be found at [Managing IPU Resources](https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/useful-tips/managing_ipu_resources.ipynb)." | ||
], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"summarizer.model.detachFromDevice()" | ||
], | ||
"outputs": [], | ||
"metadata": {} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Conclusions and next steps\n", | ||
"\n", | ||
"This notebook demonstrated running a text summarization task on Graphcore IPUs, with BART-L using an inference pipeline from Optimum Graphcore.\n", | ||
"\n", | ||
"Try out the other [IPU-powered Jupyter Notebooks](https://www.graphcore.ai/ipu-jupyter-notebooks) to see how how IPUs perform on other tasks." | ||
], | ||
"metadata": {} | ||
] | ||
} | ||
], | ||
"metadata": { | ||
|
@@ -342,4 +342,4 @@ | |
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} | ||
} |