Skip to content

Latest commit

 

History

History
170 lines (146 loc) · 6.42 KB

README.md

File metadata and controls

170 lines (146 loc) · 6.42 KB

IKT590

README may be outdated.

Setup

git clone https://github.com/vetleledaal/ikt590.git vetle_ikt590
cd vetle_ikt590
pip install poetry # alternatively use pipx
poetry install --with=dev

Note: you need poetry 1.2 or later. Check with poetry --version.

If TMU or pycuda fails to install you can check the error in the poetry environment:

poetry run pip install git+https://github.com/cair/tmu.git
poetry run pip install pycuda

Datasets

You also need to download the datasets (from project root):

git clone https://github.com/KaiDMML/FakeNewsNet.git
wget https://github.com/Gautamshahi/FakeCovid/raw/master/data/FakeCovid_July2020.csv
git clone https://github.com/hate-alert/HateXplain.git
git clone https://github.com/pmacinec/fake-news-datasets.git
# https://www.kaggle.com/datasets/mrmorj/hate-speech-and-offensive-language-dataset
git clone https://github.com/Vicomtech/hate-speech-dataset.git

How to use

FakeNewsNet supports the features:

  • text
  • domain
  • tweet

FakeCovid support the features:

  • text
  • domain

Feature permutations:

poetry run python pipeline.py --dataset FakeNewsNet --feature all
poetry run python pipeline.py --dataset FakeNewsNet --feature text
poetry run python pipeline.py --dataset FakeNewsNet --feature text domain
poetry run python pipeline.py --dataset FakeNewsNet --feature text domain tweet
poetry run python pipeline.py --dataset FakeNewsNet --feature domain
poetry run python pipeline.py --dataset FakeNewsNet --feature domain tweet
poetry run python pipeline.py --dataset FakeNewsNet --feature tweet

poetry run python pipeline.py --dataset FakeCovid --feature all
poetry run python pipeline.py --dataset FakeCovid --feature text
poetry run python pipeline.py --dataset FakeCovid --feature text domain
poetry run python pipeline.py --dataset FakeCovid --feature domain

All arguments:

usage: pipeline.py [-h] [--num-clauses NUM_CLAUSES] [--T T] [--s S] [--epochs EPOCHS] [--device DEVICE] [--seed SEED] [--dataset {FakeNewsNet,FakeCovid}]
                   [--feature {all,text,domain,tweet} [{all,text,domain,tweet} ...]] [--test-size TEST_SIZE] [--malformed {fix,drop}] [--max-vocab MAX_VOCAB] [--max-domain MAX_DOMAIN]
                   [--max-tweet MAX_TWEET]

optional arguments:
  -h, --help            show this help message and exit
  --num-clauses NUM_CLAUSES
  --T T
  --s S
  --epochs EPOCHS
  --device DEVICE
  --seed SEED
  --dataset {FakeNewsNet,FakeCovid}
  --feature {all,text,domain,tweet} [{all,text,domain,tweet} ...]
  --test-size TEST_SIZE
  --malformed {fix,drop}
  --max-vocab MAX_VOCAB
  --max-domain MAX_DOMAIN
  --max-tweet MAX_TWEET

Defaults:

--num-clauses: 5000
--T: 100
--s: 10.0
--epochs: 100
--device: GPU
--seed: 42
--dataset: FakeNewsNet
--feature: all
--test-size: 0.2
--malformed: fix
--max-vocab: 3000
--max-domain: 500
--max-tweet: 500

Possible issues

ModuleNotFoundError: No module named 'tensorflow'

If you encounter this error on Windows, it may be due to a lack of LongPaths support. You can enable LongPaths by running the following command in an administrative PowerShell terminal:

Set-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name 'LongPathsEnabled' -Value 1

LongPaths are only supported in Windows 10 Build 1607 or later.

After enabling you could update tensorflow within the poetry environment:

poetry run pip install tensorflow -U

pycuda fails to install

On Windows, pycuda may fail to install if you don't have the Microsoft Visual C++ 14.0 or later installed. You can install "Microsoft Visual C++ Build Tools".

On Linux, this is one possible error. No fix yet:

pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmp0jmwjg0w.cu failed
[command: nvcc --preprocess -arch sm_52 -I/home/vetle/.cache/pypoetry/virtualenvs/thesis-g87dtkxN-py3.8/lib64/python3.8/site-packages/pycuda/cuda /tmp/tmp0jmwjg0w.cu --compiler-options -P]
[stderr:
b"In file included from /usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/cuda_runtime.h:83,\n                 from <command-line>:\n/usr/local/cuda-11.2/bin/../targets/x86_64-linux/include/crt/host_config.h:139:2: error: #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.\n  139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.\n      |  ^~~~~\n"]

Datasets

Implemented

  • FakeNewsNet
  • FakeNewsNet-politifact
  • FakeNewsNet-gossipcop
  • FakeCovid
  • HateXPlain
  • HateXPlain-binary
  • fake-news-datasets-deception-FakeNewsAMT
  • fake-news-datasets-deception-Celebrity
  • fake-news-datasets-Election-Day
  • fake-news-datasets-deception-FakeNewsChallenge
  • fake-news-datasets-deception-FakeNewsChallenge-body
  • fake-news-datasets-deception-FakeNewsCorpus
  • fake-news-datasets-deception-FakeNewsCorpus-body
  • hate-speech-dataset

To be implemented

  • fake-news-datasets: Fake News detection - Kaggle
  • fake-news-datasets: Fake News - Kaggle
  • fake-news-datasets: Fake News vs Satire
  • fake-news-datasets: Fakeddit
  • fake-news-datasets: FEVER
  • fake-news-datasets: GeorgeMcIntire/fake_real_news_dataset
  • fake-news-datasets: Getting real about Fake News - Kaggle
  • fake-news-datasets: HoaxDataset
  • fake-news-datasets: LIAR
  • fake-news-datasets: Misinfofinder
  • fake-news-datasets: OpenSources
  • fake-news-datasets: PHEME
  • fake-news-datasets: This Just In

Excluded (non-latin)

  • fake-news-datasets: BanFakeNews
  • fake-news-datasets: Detecting Rumors Microblogs
  • fake-news-datasets: EANN-KDD18
  • fake-news-datasets: Hack the Fake News
  • fake-news-datasets: Monant API
  • fake-news-datasets: News Credibility
  • fake-news-datasets: WeFEND-AAAI20
  • fake-news-datasets: WSDM - Fake News Classification - Kaggle

Excluded (other)

  • fake-news-datasets: BuzzFeedNews Facebook Facts (no text)
  • fake-news-datasets: CREDBANK (account suspended from AWS)