Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port codebase to Python3 #2

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

shrianshChari
Copy link
Contributor

@shrianshChari shrianshChari commented Aug 23, 2024

Includes updating print statements and replacing the maintenance-only ujson with orjson.

@monsanto
Copy link
Member

When I did the PyPy conversion, I remember just using the json library directly. It seems like that's the case here too? ujson lines are commented out. So I don't know why we need another dependency at all, unless it has a proven speed-up.

TierUpdate9Doubles.py Outdated Show resolved Hide resolved
@shrianshChari
Copy link
Contributor Author

When I did the PyPy conversion, I remember just using the json library directly. It seems like that's the case here too? ujson lines are commented out. So I don't know why we need another dependency at all, unless it has a proven speed-up.

orjson is faster than json according to online sources, but I guess we can use it later if it is necessary

StatCounter.py Outdated Show resolved Hide resolved
batchLogReader.py Outdated Show resolved Hide resolved
@monsanto
Copy link
Member

When I did the PyPy conversion, I remember just using the json library directly. It seems like that's the case here too? ujson lines are commented out. So I don't know why we need another dependency at all, unless it has a proven speed-up.

orjson is faster than json according to online sources, but I guess we can use it later if it is necessary

Yes please, if we weren't using anything but the json library to begin with, this isn't the time for optimization

StatCounter.py Outdated Show resolved Hide resolved
@@ -1,7 +1,6 @@
name: sus-env2
name: sus-env3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is environment.yml for anyway? Is this still Conda garbage? I thought I got rid of that for PyPy

Did you need to use this to install? It would be nice if this worked without dependencies, but if we need dependencies, maybe moving to poetry, that's what I did last time I modernized Python code

Clarifying the installation would be good, since we'll have to change that when we resetup the server environment to run this with python 3

Copy link
Contributor Author

@shrianshChari shrianshChari Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed you were using conda to get Python 2 running, but I can get PyPy set up. We will need dependencies because it uses js2py to grab some data from Pokemon Showdown. I'll just leave it as conda for now, but we can move it to PyPy/poetry later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marty-D do you remember the deal with this? I feel like I got rid of this dependency at some point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I didn't get rid of it. Either way I don't know how to use Conda. We should have instructions on how to use it in the README. We don't have to switch to something else in this PR if it works OK.

Copy link
Member

@monsanto monsanto Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: PyPy, my understanding is that it's not as useful with recent versions of Python now that it has its own JIT. So just having it work in whatever configuration is used on the server is fine. I'm not sure my changes ever got used in the main workflow anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marty-D do you remember the deal with this? I feel like I got rid of this dependency at some point

I'm not sure, I wanted to use your changes on the new sim server but couldn't find how to configure PyPy on nix, or if I even had the right permissions to do so. Didn't have time to waste getting sh or whoever to do it so I just used the old conda stuff, which I feel like ended up taking longer to figure out than in 2019.

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
git clone https://github.com/Marty-D/Smogon-Usage-Stats
cd Smogon-Usage-Stats/
conda env create
tmux attach
cd Smogon-Usage-Stats/
conda env create
cd ..
source ~/.bashrc
conda --version
cd Smogon-Usage-Stats/
tmux attach
  source activate sus-env2
  conda info --envs
    # conda environments:
    #
    base                     /home/marty/miniconda3
    sus-env2              *  /home/marty/miniconda3/envs/sus-env2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I didn't get rid of it. Either way I don't know how to use Conda. We should have instructions on how to use it in the README. We don't have to switch to something else in this PR if it works OK.

This can happen in a different PR then.

@monsanto
Copy link
Member

The rest looks OK to me. Thanks for taking this on

@monsanto
Copy link
Member

monsanto commented Aug 23, 2024

If you could revert the load -> loads + read type changes as well that would be good, it's unclear if this has the same performance characteristics at a glance (I don't know what these functions do internally; they might not buffer the entire whole thing into memory)

@monsanto
Copy link
Member

And also a comment on the trainer name unicode stuff, what you had before was sensible and the obvious thing to do, so down the line it might not be clear to someone doing further cleanups why this is the way it is

@shrianshChari
Copy link
Contributor Author

If you could revert the load -> loads + read type changes as well that would be good, it's unclear if this has the same performance characteristics at a glance (I don't know what these functions do internally; they might not buffer the entire whole thing into memory)

Did

@shrianshChari
Copy link
Contributor Author

And also a comment on the trainer name unicode stuff, what you had before was sensible and the obvious thing to do, so down the line it might not be clear to someone doing further cleanups why this is the way it is

Did

batchLogReader.py Outdated Show resolved Hide resolved
@monsanto monsanto marked this pull request as ready for review August 24, 2024 18:38
@monsanto
Copy link
Member

monsanto commented Aug 24, 2024

If it works locally for you then I approve these changes. Thanks again for the effort. I'll leave it unmerged until @Marty-D has time to integrate it in his workflow.

Co-authored-by: Christopher Monsanto <[email protected]>
@shrianshChari
Copy link
Contributor Author

If it works locally for you then I approve these changes. Thanks again for the effort. I'll leave it unmerged until @Marty-D has time to integrate it in his workflow.

There is a bug in this code that I'm still looking through; I'll put more details once I'm back at my computer.

@shrianshChari
Copy link
Contributor Author

To follow up on the bug, it's present in the Python 2 version as well, you can merge this PR.

@shrianshChari shrianshChari changed the title Attempt to port codebase to Python3 Port codebase to Python3 Aug 24, 2024
@monsanto
Copy link
Member

Well, what is the bug?

@shrianshChari
Copy link
Contributor Author

shrianshChari commented Aug 24, 2024

In the synthetic data that I used to test the Tera types feature, I came across an issue where when batchMovesetCounter.py listed out the teammates, instead of there being an "Other" section, it would display the last teammate displayed and an extra pipe character. This is what it looked like:

 +----------------------------------------+ 
 | Dragapult                              | 
 +----------------------------------------+ 
 | Raw count: 1307                        | 
 | Avg. weight: ---                       | 
 | Viability Ceiling: 0                   | 
 +----------------------------------------+ 
 | Abilities                              | 
 | Clear Body 87.605%                     | 
 | Infiltrator 12.395%                    | 
 +----------------------------------------+ 
 | Items                                  | 
 | Choice Band 87.605%                    | 
 | Heavy-Duty Boots 12.395%               | 
 +----------------------------------------+ 
 | Spreads                                | 
 | Jolly:0/252/0/0/4/252 87.605%          | 
 | Timid:0/0/4/252/0/252 12.395%          | 
 +----------------------------------------+ 
 | Moves                                  | 
 | U-turn 100.000%                        | 
 | Tera Blast 87.605%                     | 
 | Dragon Darts 87.605%                   | 
 | Quick Attack 87.605%                   | 
 | Hex 12.395%                            | 
 | Will-O-Wisp 12.395%                    | 
 | Other 12.395%                          | 
 +----------------------------------------+ 
 | Teammates                              | 
 | Kingambit 103.039%                     | 
 | Primarina 90.840%                      | 
 | Samurott-Hisui 90.840%                 | 
 | Iron Crown 90.840%                     | 
 | Landorus-Therian 90.840%               | 
 | Zamazenta 12.199%                      | 
 | Zamazenta 12.199%                      | |
 +----------------------------------------+ 
 | Checks and Counters                    | 
 +----------------------------------------+

Specifically, the line that goes | Zamazenta 12.199% | | is the bug.

I thought it appeared when porting to Python 3, but when I ran it in Python 2 with the same data, the same result appeared. I know that when the code runs the following (on line 235 of batchMovesetCounter.py:

for i in range(len(table)): 
    if (total > .95 and x != 'Abilities') or (x == 'Abilities' and i>5) or (x == 'Spreads' and i>5) or (x == 'Teammates' and i>10) or (x == 'Checks and Counters' and i>11):
        if x == 'Moves':
            line = ' | %s %6.3f%%' % ('Other',400.0*(1.0-total))
        elif x not in ['Teammates','Checks and Counters']:
            line = ' | %s %6.3f%%' % ('Other',100.0*(1.0-total))

the total will be large enough to enter the conditional, but because x = 'Teammates', it will just reuse the previous line when printing instead of printing the 'Other' statistics. However, it seems to not occur with every Pokemon in my synthetic data (for example, this only occurs with a couple of Pokemon in my dataset) and I don't see it on the Smogon stats.

Tldr; fixing this (if it even is an issue) is better saved for another PR.

@monsanto
Copy link
Member

Okay, it can be addressed later

environment.yml Outdated Show resolved Hide resolved
Co-authored-by: Christopher Monsanto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants