Has anybody considered using LLM's to reverse engineer legacy (mainframe) code to generate documentation? Github copilot is quite good at providing low-level documentation, but we would love to take a large volume of mainframe code and generate the requirements/technical designs directly from the code. IE What business processes/designs are embedded in the code? My limited experimentation and research tells me I am being overly optimistic, but understanding legacy code is alas not an unusual requirement. So I would love to get insights from my peers.
Sort by:
Any updates on this? Have you tried anything yet? I am doing similar research but haven't gotten very useful results yet.
IBM Watsonx Code Assistant for Z has capability to generate documentation and/or transform COBOL to Java on Z. https://www.ibm.com/products/watsonx-code-assistant-z
We worked with IBM last year to document a legacy PHP application using a combination of LLM (I think we settled on Llama in the end after testing quite a few) and traditional software documentation tools. The output was a HTML based wiki which was a very helpful as we want to rearchitect but so much of the business logic was buried in thousands of lines of code. At that time GitHub copilot was still in its infancy and wasn't the right tool for our task but if your code is in GitHub it's the first option to consider. One of the challenges with this approach is that it can take a lot of effort (and a good data scientist) to set this up for a point in time snapshot of your code base and ideally you would want the documentation to be updated every time you push a code release. I'd be interested to see if GitHub copilot addresses the lifecycle maintenance challenges of documenting legacy code too.
I believe it is possible, using market LLM models may not perform as expected, I would recommend funetune your own LLM modules using small specialized LLM (see StarCoder: A State-of-the-Art LLM for Code (huggingface.co)) or Mixtral (huggingface.co). or if you do not have a capacity, and because it is a mainframe code may be see Watson from IBM, they should have solutions to your needs.
we are searching for something similar. We found some vendor that tried to realize tool to relize documentation with AI/LLM solutions, and also system integrator that made valuable tool for their migration project, that they are extending as standalone product. But so far we are evaluating solutions to realize a PoC.