Comment by pthangeda

1 year ago

Congratulations on the launch - this looks great and I've been waiting for years for something like this! As a researcher who mostly uses Python, and explores/navigates a large number of repos for a short time, often written by other researchers not necessarily trained in software best practices, I was always frustrated (and surprised) that there was no VS Code extension or tool that gave me a quick overview/visualization to get a high level gist of different modules and code/data flow!

I tried this with a bunch of small open-source repos and it works great! I imagine using LLM might be a hard no for some people/enterprises - any plans to use stand-alone licenses with small local models? It seems like for what LLM is doing here (if I understand it right, help label the modules in natural language and perhaps help organize them into this hierarchy/modules) you don't necessarily need a SoTA model, right?

Also, this could be coming from LLMs, but I see that the visualizations are more biased towards terminology used in web-dev? (for example, one of my robot related repo was organized into front-end, back-end, etc. with I guess is kinda right but not exactly lol). It would be nice to see an interactive visualization where I can iterate on the initial viz with information I know, e.g., I drag and drop a module or rename it and then you probably do another pass with this feedback and LLMs and update my overall visualization with more domain specific labels and partitions?

Edit: Exploring CodeViz on a few more repos, and it seems like you have a set of hardcoded labels for the highest hierarchy in the architecture diagram? (so far, I've only seen Users, Databases, Backend, Frontend, and Shared Components). I am guessing this is something passed on in the prompts? It'll be nice to allow user to define their own set of labels/partitions at one or more levels and then try to create an architecture visualization that fits into these labels/constraints (although I am guessing at some point you have to be wary of hallucinations?)

5 comments

pthangeda

WillMcCall 1 year ago

Re: Edit

The top level categorizations are indeed fixed, however the nodes themselves can be arbitrary. We've found this helps with grouping and organization while still allowing for the flexibility required to accommodate different systems. I'm curious, are there any categories missing here that could be added?

Currently, we categorize by: Frontend (UI/UX elements), Backend (API/Business/Data Access), DB(persisting storage), External Services (Backends maintained outside codebase), Shared Components (internally maintained libraries and helpers)

zingar 1 year ago
I don’t think I’d draw a diagram by hand with an explicit “frontend” label, I think I’d most likely leave it without a label. If I had to choose I would leave it labeled as “UI”.
- WillMcCall 1 year ago
  
  That makes sense, we'll make this change!

0xEF 1 year ago

I like this perspective. I am just starting to dip my toes into software development and one thing I love about the industry as a whole is that it allows for and often encourages new ways of doing a thing, which in turn promotes new ways of thinking about thing. So many industries I interact with are stuck in a "that's the way it's always been done" loop and it is maddening, sometimes.

LiamPrevelige 1 year ago

We'd love to use local models and have played around with them a bit. Exactly right about the labeling - we didn't stick with local models because 3.5 sonnet is exceptionally good at finding niche architecture labels and merging similar modules (since code analysis is chunked). Copilot tools are becoming very popular, so companies are getting less strict around LLMs and code, but ultimately we do think everyone is better off if CodeViz is self-contained.

There is some hard coded bias for web dev. Diagram modification is definitely high on our todo, and we've been finding ways to reduce pre-defined structure in our prompts to LLMs so they work with broader tech stacks. When we sell licenses to teams, we do some manual checks for accuracy and detail, which helps us improve the public extension.

What's the name of the robotics repo? And any preference for modifying the diagram directly vs instructing changes by text?