Skip to content

💡Light Bulb is a tool to help you label, train, test and deploy machine learning models without any coding.

Notifications You must be signed in to change notification settings

czhu12/light_bulb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💡 Light Bulb

Light Bulb is an labeling tool built with state of the art active learning and semi supervised learning techniques. Currently supports text classification and image classification.

See the Medium post here.

Table of Contents

Getting Started

Mac OSX

brew install yarn
git clone https://github.com/czhu12/labelling-tool && cd labelling-tool
make

Cat / not cat demo dataset

make dataset/cat_not_cat # Download and set up dataset.
./bin/run config/examples/cat_not_cat.yml # Server set up on localhost:5000

Small IMDB reviews dataset

make dataset/small_imdb_reviews # Download and set up dataset.
./bin/run config/examples/small_imdb_reviews.yml # Server set up on localhost:5000

Usage

Configuration

Heres an example configuration:

task:
  title: What kind of animal is this?
  description: Select the type of animal you see, if there is none, select "Skip"
dataset:
  directory: dataset/image_classification/
  data_type: images
  judgements_file: outputs/image_multiclass_classification/labels.csv
label:
  type: classification
  classes:
    - Dog
    - Cat
    - Giraffe
    - Dolphin
    - Skip
model:
  directory: outputs/image_multiclass_classification/models/
user: chris

task

task:
  title: What kind of animal is this?
  description: Select the type of animal you see, if there is none, select "Skip"

dataset

dataset:
  directory: dataset/image_classification/
  data_type: images
  judgements_file: outputs/image_multiclass_classification/labels.csv

judgements_file defines the file that the labels are saved in.

data_type defines what type of model is used. Valid options are images and text

label

label:
  type: classification
  classes:
    - Dog
    - Cat
    - Giraffe
    - Dolphin

type defines the type of label, options are classification and binary.

model

model:
  directory: outputs/image_multiclass_classification/models/

directory defines where the trained model is saved.

user

user: chris

user defines who the person labeling is, which may be useful when the label's are used.

Example Text classification

To run the text classification demo:

make dataset/small_imdb_reviews
./bin/run config/examples/small_imdb_reviews.yml

Example Image Classification

To run the image classification demo:

make dataset/cat_not_cat
./bin/run config/examples/cat_not_cat.yml

How It Works

Architecture

Encoder Decoder

Most deep learning tasks can be framed as a encoder - decoder architecture. For example, text classification can be framed as an LSTM encoder that outputs into a logistic regression decoder. Object detection can be framed as a ResNet encoder with a regression decoder. All models in Light Bulb are framed as an encoder - decoder architecture, and the encoder are pre-trained on an external dataset (Image Net for images, and Wikitext-103 for text), and then fine-tuned on the target dataset.

Semi Supervised Text

Encoder Decoder

Light Bulb's text encoder is a pretrained language model on wikitext-103 (inspired by ULMFiT), with a vocab limited to the most frequent 100k words in the corpus. The model is fine-tuned on the target dataset as a language model.

Semi Supervised Image

Squeeze Net

Light Bulb uses Squeeze Net pretrained on the ImageNet dataset to encode image data. The encoder is fine-tuned on the target dataset that is given to be labeled as an auto-encoder. Standard image augmentation techniques are used to expand the labeled training set.

Active Learning

Light Bulb will train a model as you provide training data through labeling. Light Bulb will sample items to be labeled by scoring the unlabeled items and sample the highest entropy items.

Coming Soon

  • Sequence Tagging
  • Object Detection
  • Sequence to Sequence Modeling
  • Dockerize Application

About

💡Light Bulb is a tool to help you label, train, test and deploy machine learning models without any coding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published