Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGPT #390

Open
dataf3l opened this issue Jun 13, 2023 · 5 comments
Open

ChatGPT #390

dataf3l opened this issue Jun 13, 2023 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@dataf3l
Copy link

dataf3l commented Jun 13, 2023

Describe the bug
With the Advent of ChatGPT, is it possible the estimated cost to re-build things will change?
Should we change then the estimates generated by the tool? to take into consideration things
like Copilot, ChatGPT, etc?

To Reproduce

ran scc on a very short effort source code and it claims it took 3 months (not the actual case).

Expected behavior

a more realistic estimate

Desktop (please complete the following information):

  • OS: osx
  • Version scc version 3.0.0
@boyter
Copy link
Owner

boyter commented Jun 14, 2023

So good news! You can actually do this yourself,

scc --cocomo-project-type "custom,1,1,1,1"

Where you change the values to get the result you want.

The model itself is COCOMO https://en.wikipedia.org/wiki/COCOMO so you can look at how it works to get an idea.

You can couple this with --avg-wage to get a more realistic value, but keep in mind the COCOMO model itself is based on time to develop including cost of the hardware, building, chair you sit in etc... IE overall project cost.

If you come up with a custom type that works as you would expect, pick a good name for it (current ones are organic, semi-detached, embedded) and I can include it in as an option.

It sounds like you might be suggesting to get a better model though. I am open to this idea, but I need to see how the model is actually implemented. COCOMO was my first choice since I wanted to replicate the functionality of sloccount https://dwheeler.com/sloccount/ and because the model is fairly easy to implement. However I am very open to having multiple calculation models in the tool if it gives better results. Of course I need an example of how to implement it first.

I did investigate COCOMO II for a while but could never find an implementation to copy from. I guess we could develop our own based on the same values that COCOMO uses but factor in complexity and perhaps even the language as a scaling factor but I would prefer to get something that already exists.

@dataf3l If you feel like finding some weights that fit to your expectation across multiple projects post them here, I will try across a some and see what shakes out. If it seems good you get the glory of picking a new name.

If however you find a better model I will be happy to add in the ability to flip from COCOMO to something else. Im also happy to do both if you find that a better option.

@boyter boyter added enhancement New feature or request help wanted Extra attention is needed labels Jun 14, 2023
@dataf3l
Copy link
Author

dataf3l commented Jun 14, 2023

AI should reduce the cost of development:

https://www.youtube.com/watch?v=VErKCq1IGIU&ab_channel=MOVClips

I kinda expected to be told to go change the avg-salary variable, I kinda knew that, you see, what I'm trying to point out, is that I think the world needs a "cocomo.ai" or something similar based on these (new) assumptions:

  1. development is done half by AI and half by humans, development in the future may just be AIs talkin to each other, which will kinda affect this project and it's usefulness.
  2. AI helps tremendously in the development of software, impacting cost and quality
  3. there are multiple AIs each with price and also benefit
  4. a lot of people are now using ChatGPT: https://trends.google.com/trends/explore?date=today%205-y&q=chatgpt,javascript&hl=en , and English could be considered a "programming Language"

So, in the short term, we can expect gains, but what about the long term?

https://www.youtube.com/watch?v=GFiWEjCedzY&ab_channel=DisneyLivin

Are people going to have to be a master wizard to fix the bugs introduced by the AI, will that impact maintenance cost?
Will people scrap systems built by AI, and proceed with non-generated code instead? will people split into camps? will this split become kinda vegans vs non vegans?

So with all these things in mind, I think this may affect SCC project, and others like it like CLOC and friends.

I think AI will bring a new era of Software, where developers are more abundant, since more people will become
developers, but true depth in understanding of the Software may elude the new generation, except for the
more dedicated few, in the same way compilers made people forget what EAX stands for, it's possible that
this new "compiler" will create a new "wave of code".

WIll surfing waves of code, become the new normal? how will tools evolve to rise to this challenge?

I think there is a need for a new costing model, if it's called "cocomo III" or "cocomo.ai" or whatever is not the main
and most important thing, but rather, HOW can cost be measured, is man-hours still the metric?

If developers ask the AI to build the code, go to sleep, and then wake up 8 hours later to review the code written and tested by the AI, and find that the code is OK, and ready to go, should they charge 8 hours? was that 8 hours of labor? should that be priced in?

if developers need a huge computer to run a local model, like LLama or Alpaca, should the cost of the GPUs be taken into consideration? are slow laptops no longer usable for development in the new AI age?

will there be an abundance of these GPUs?

I think that the tools need to adapt to the new reality, I don't have a plan on how to accomplish a more realistic model, but here are some numbers:

https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/

If things like Copilot and ChatGPT make developers more productive, how can that be quantified into SCC?

If copilot generates more repetitive code, and the developers are no longer incentivized to make a beautiful masterpiece, but will rather "crank out repetitive code" because that's what the AI gives and they are too lazy to change it (they are developers after all), does this mean "more code" does not equate to "more time" ?

What if a developer asks in the prompt explicitely "no comments please", or "make it terse", will this out-of-distribution prompts on the code request cause the number of lines of code to become a meaningless metric ?

Just some things I've been thinking about.

@boyter I give you instead the honor and glory to name it whatever you desire, but, will you please help this new, confused world, with a new version of scc, which uses dates, perhaps also a statistical model that will predict if a line of code was-it-made-by-ai-or-not thingie, so we can better assess what was made by AI (cheaply) and what was made by humans (costing dearly?), so we can truly get to a truer perspective of real(er) cost?

These, I think, are the new challenges.

@boyter
Copy link
Owner

boyter commented Jun 15, 2023

I'm going to park this, in the sense I am not actively developing for it.

I am not saying no. I am saying not right now. Simply because the landscape is moving too quickly at the moment to start implementing. Waiting a few months seems like a decent approach.

I am keeping a close eye on the state of things however the following still appears to be true.

Lines of code is still a metric that can be used to inform. Either because at some point someone needs to maintain or understand the code, and as such its still something that has meaning. That and that number of lines is an indicator of where there might be issues.

The cost question is one to consider, but again its estimating the cost to develop based on the assumption a person wrote the code. If there is a way to model code based on the assumption a computer wrote it, then we can use that (assuming one exists) and if we can determine who wrote the code that would allow a better breakdown.

Until we have the following,

  1. A way to reliably determine if code was generated through GPT or something like it.
  2. An updated COCOMO model to deal with point 1

There is little that can be done right now. I am keeping an eye on both of these. Its possible there will need to be a new tool to deal with these problems and have that feed into scc but that remains to be seen.

@dataf3l
Copy link
Author

dataf3l commented Jun 15, 2023

I agree that the landscape moves quickly.

Lines of code can be summarized, refactored, expanded, that's a new thing, it wasn't there before.

This tool can detect GPT2

http://gltr.io/

it uses the model weights to determine if it was generated or not, OpenAI has also released a tool to determine if stuff came out of GPT3, but as time goes by, and tools improve, this task will become harder and harder.

https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text

Also, there will be multiple models, so there is that.

maybe we can just put in a multiplier for lines identified as non-written by humans.

For example, a million lines of code XML file probably wasn't done by hand.
an example can be a million lines of CSV probably wasn't written by hand either, should be discounted.
auto-generated code should be discarded? (not 100% sure on this one, but I think most likely yes).
Flies which are mostly auto-generated but had a small user input, should be discarded?

Nobody writes their package.lock.json for example.

So maybe the variability of the file itself, it if looks very repetitive, perhaps it's an output, not human made code.

libraries, like node_modules, /vendor, /venv, etc should be discarded as well.

I think even though we can't solve the 100% of the problem, some heuristic on how to evaluate and discard files
could be used to affect the model, if we think a human didn't write it, and will probably never read it, why count it?

maybe another tool is required to discard/categorize files, which can be used as an input for scc?
a is-it-ai-or-not.py for example?

And another thing, thank you for setting expectations from the start, I appreciate it.

@boyter
Copy link
Owner

boyter commented Jun 15, 2023

To solve the package.lock and other such files I strongly suggest using a .ignore file which will be respected similar to tools like ripgrep, silver searcher and such to ignore.

I have debated adding vendor and such as a default to ignore, but I think its a default that would surprise the user hence did not include it.

I am in the middle of refactoring scc to have proper support for all features of gitignore and ignore files but thats still a little away. Avoid globs for the moment and it works though.

As for determining if it was written by GPT that tool looks interesting. I wonder if you could use it to append a comment on files, or dynamicly update the ignore file to support this. You can use the various scc options to disable looking at certain files through the use of --no-gen and --generated-markers although this only applies to files as a whole.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants