💡 Light Bulb

Light Bulb is an labeling tool built with state of the art active learning and semi supervised learning techniques. Currently supports text classification and image classification.

See the Medium post here.

Getting Started

Mac OSX

brew install yarn
git clone https://github.com/czhu12/labelling-tool && cd labelling-tool
make

Cat / not cat demo dataset

make dataset/cat_not_cat # Download and set up dataset.
./bin/run config/examples/cat_not_cat.yml # Server set up on localhost:5000

Small IMDB reviews dataset

make dataset/small_imdb_reviews # Download and set up dataset.
./bin/run config/examples/small_imdb_reviews.yml # Server set up on localhost:5000

Usage

Configuration

Heres an example configuration:

task:
  title: What kind of animal is this?
  description: Select the type of animal you see, if there is none, select "Skip"
dataset:
  directory: dataset/image_classification/
  data_type: images
  judgements_file: outputs/image_multiclass_classification/labels.csv
label:
  type: classification
  classes:
    - Dog
    - Cat
    - Giraffe
    - Dolphin
    - Skip
model:
  directory: outputs/image_multiclass_classification/models/
user: chris

task

task:
  title: What kind of animal is this?
  description: Select the type of animal you see, if there is none, select "Skip"

dataset

dataset:
  directory: dataset/image_classification/
  data_type: images
  judgements_file: outputs/image_multiclass_classification/labels.csv

judgements_file defines the file that the labels are saved in.

data_type defines what type of model is used. Valid options are images and text

label

label:
  type: classification
  classes:
    - Dog
    - Cat
    - Giraffe
    - Dolphin

type defines the type of label, options are classification and binary.

model

model:
  directory: outputs/image_multiclass_classification/models/

directory defines where the trained model is saved.

user

user: chris

user defines who the person labeling is, which may be useful when the label's are used.

Example Text classification

To run the text classification demo:

make dataset/small_imdb_reviews
./bin/run config/examples/small_imdb_reviews.yml

Example Image Classification

To run the image classification demo:

make dataset/cat_not_cat
./bin/run config/examples/cat_not_cat.yml

How It Works

Architecture

Most deep learning tasks can be framed as a encoder - decoder architecture. For example, text classification can be framed as an LSTM encoder that outputs into a logistic regression decoder. Object detection can be framed as a ResNet encoder with a regression decoder. All models in Light Bulb are framed as an encoder - decoder architecture, and the encoder are pre-trained on an external dataset (Image Net for images, and Wikitext-103 for text), and then fine-tuned on the target dataset.

Semi Supervised Text

Light Bulb's text encoder is a pretrained language model on wikitext-103 (inspired by ULMFiT), with a vocab limited to the most frequent 100k words in the corpus. The model is fine-tuned on the target dataset as a language model.

Semi Supervised Image

Light Bulb uses Squeeze Net pretrained on the ImageNet dataset to encode image data. The encoder is fine-tuned on the target dataset that is given to be labeled as an auto-encoder. Standard image augmentation techniques are used to expand the labeled training set.

Active Learning

Light Bulb will train a model as you provide training data through labeling. Light Bulb will sample items to be labeled by scoring the unlabeled items and sample the highest entropy items.

Coming Soon

Sequence Tagging
Object Detection
Sequence to Sequence Modeling
Dockerize Application

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
bin		bin
config		config
docs/images		docs/images
light_bulb		light_bulb
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💡 Light Bulb

Table of Contents

Getting Started

Mac OSX

Cat / not cat demo dataset

Small IMDB reviews dataset

Usage

Configuration

task

dataset

label

model

user

Example Text classification

Example Image Classification

How It Works

Architecture

Semi Supervised Text

Semi Supervised Image

Active Learning

Coming Soon

About

Releases

Packages

Contributors 3

Languages

czhu12/light_bulb

Folders and files

Latest commit

History

Repository files navigation

💡 Light Bulb

Table of Contents

Getting Started

Mac OSX

Cat / not cat demo dataset

Small IMDB reviews dataset

Usage

Configuration

task

dataset

label

model

user

Example Text classification

Example Image Classification

How It Works

Architecture

Semi Supervised Text

Semi Supervised Image

Active Learning

Coming Soon

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages