Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dictionaries from the tibetan-dictionary repo (resolves #75 and #51) #83

Closed

Conversation

dignissimus
Copy link

@dignissimus dignissimus commented Oct 13, 2019

Ran the follwing script as ./add_dictionaries.sh ../tibetan-dictionary/_input/dictionaries/public/ dicts/

SOURCE_LOCATION="$1"
TARGET_LOCATION="$2"

for file in $SOURCE_LOCATION/*
do
    new_name="$(echo $(basename $file) | sed -r 's/[0-9]+-(.+)/\1.txt/')"
    cp -n $file $TARGET_LOCATION/$new_name 
done

Resolves #75 and #51


This change is Reviewable

@willbasky
Copy link
Owner

a discussion (no related file):
Thank you!

I have couple questions.

  1. Why run this script if you have already added dictionaries to the main folder?
  2. What about other dictionaries from https://github.com/christiansteinert/tibetan-dictionary/tree/master/_input/dictionaries/public_en that are from English to Tibetan (Sanskrit)?

I see some names of English dictionaries are the same as Tibetan ones. So I will extend meta description for these purposes.


@dignissimus
Copy link
Author

Why run this script if you have already added dictionaries to the main folder?

I wrote the script to automate copying over the dictionaries, I ran the script to add the dictonaries to the main folder

What about other dictionaries from https://github.com/christiansteinert/tibetan-dictionary/tree/master/_input/dictionaries/public_en that are from English to Tibetan (Sanskrit)?

All the dictionaries in the public_en folder are inside the public folder, so there was no need to do both

@willbasky
Copy link
Owner

All the dictionaries in the public_en folder are inside the public folder, so there was no need to do both

Some of them or all of them have the same name with different content.

@dignissimus
Copy link
Author

dignissimus commented Oct 14, 2019

Some of them or all of them have the same name with different content.

Ah, which ones should be kept? The public_en ones?

@dignissimus
Copy link
Author

I ran ./check_duplicates.sh tibetan-dictionary/_input/dictionaries/public tibetan-dictionary/_input/dictionaries/public_en
with

#!/usr/bin/env bash
SOURCE_DIRECTORY="$1"
TARGET_DIRECTORY="$2"

for source_file in $SOURCE_DIRECTORY/*
do
    b_name=$(basename $source_file)
    target_file=$TARGET_DIRECTORY/$b_name
    if [ -f "$target_file" ]; then
        source_hash=$(md5sum $source_file | cut -d ' ' -f 1)
        target_hash=$(md5sum $target_file | cut -d ' ' -f 1) 
        if [ "$source_hash" == "$target_hash" ]; then
            echo "$b_name is a duplicate in both directories ($source_hash)"
        else
            source_size=$(wc -c $source_file | cut -d ' ' -f 1)
            target_size=$(wc -c $target_file | cut -d ' ' -f 1)
            echo "$b_name is not a duplicate in both directories ($source_hash, $target_hash)"
            if ((source_size > target_size)); then
                echo " - The file from the first directory is larger"
            else
                echo " - The file from the second directory is larger"
            fi
        fi
    fi
done

All the files from the public_en directory except from 23-GatewayToKnowledge are larger than their respective counterpart in the public directory, should I keep the larger files?

@willbasky
Copy link
Owner

Wait a bit. i remake meta for adding new dictionaries.

should I keep the larger files?

They are different. So we need to have all of them by renaming. It needs to take stock of them carefully.

@willbasky
Copy link
Owner

Hey! I have just push some changes to titles file with #84
Now, there are explicit format of path and we can add new dictionaries with similar names.

It may be used on #75 issue.

I am not sure about #51 when they were added by simple replacing because I fixed some wrong formatting in some of them.

@dignissimus
Copy link
Author

dignissimus commented Oct 26, 2019

Hey! What do "T|E" and "T|S" stand for in the file names?
Edit: Also, what changes should I add to this?

@willbasky
Copy link
Owner

willbasky commented Oct 27, 2019

What do "T|E" and "T|S" stand for in the file names?

Tibetan | English and Tibetan | Sanskrit

Also, what changes should I add to this?

When you add new dictionary, info about it should be added to titles
When you add renew version of existent dictionary, it is more complex, if there is changes in hibet side and in importing side. The difference must be taken into account.

@willbasky willbasky closed this Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add English - Tibetan dictionaries
2 participants