When developing software, a well documented repository is crucial for planning and communicating project progress effectively. SuperRepo enhances version controlled repositories such as Github, by automatically generating detailed documentation using open-source LLMs. It integrates with version control systems to ensure documentation stays up-to-date as projects evolve.
SuperRepo offers potential expansion, including visual representations and recommendations for project improvement, feature development, and restructuring. Its design is reliable, modular, and scalable, providing an open-source solution for accurate generative documentation and more.
Name: Kintaro Kawai
Student number: 46985703
It’s essentially an AI-powered wiki that is synced to a projects version control which allows it to write and update documentation on every commit.
Code Base Documentation:
RAG Model and Vector Database:
Wiki Interface:
Version Control Sync:
Third-Party Authentication:
Database Design:
Dashboard for Multiple Repositories:
Visual Diagrams and Tables:
LLM Model Selection:
Quality Assurance and Feedback:
Two example flow of SuperRepo - (left) a new user’s interaction, and (right) a user making a change:
![]() |
![]() |
Github Integration and Authentication:
Textual Documentation Generation:
Contextual Information Management:
Version Control and History:
Wiki Interface:
Repository Scanning:
Public Deployment:
Database Design:
Reliability is a crucial quality attribute for SuperRepo, as it ensures that the generated documentation is accurate and informative. The effectiveness of the prompting phase significantly impacts the reliability of the outputs. By using well-crafted prompts with the latest models, SuperRepo can produce highly accurate results.
To measure reliability, SuperRepo can record input, output, and satisfaction metrics through behavioral analysis and manual human evaluation. This process provides insights into how well the prompts are constructed, allowing for improvements to enhance the overall reliability of the system.
Modularity is another key attribute, enabling SuperRepo to evolve beyond documentation generation. By understanding the context of the entire project, SuperRepo can be extended to support additional features such as chat systems that answer code-related questions or provide generative recommendations for improvements.
To achieve and measure modularity, SuperRepo should be tested against diverse scenarios. Designing a versatile API that can handle different prompts and use cases is essential. This API should not only support documentation but also be reusable for other use-cases, such as chatbots or recommendation systems.
SuperRepo’s database architecture combines NoSQL, SQL, and vector databases to mitigate scalability issues. NoSQL handles raw text documentation, while SQL manages relational data. The vector database is used for the RAG model by storing and querying vector embeddings.
To ensure scalability, SuperRepo will implement sharding to distribute data queries, and load balancing to manage traffic. These strategies should enable the system to handle large data volumes of both structured and unstructured data effectively.
To evaluate the reliability of SuperRepo, we will record input, output, and satisfaction metrics through behavioral analysis and manual human evaluation. This process will assess the accuracy and informativeness of the generated documentation, providing insights into how well the prompts are constructed for the LLMs. Continuous feedback will be used to refine prompts and improve the overall reliability of the generated documentation.
Modularity will be evaluated by testing SuperRepo against diverse scenarios to assess its ability to support additional features beyond documentation generation. It will be accessed by how easily new features can be integrated into the system without disrupting existing functionality, ensuring that SuperRepo can evolve efficiently.
Scalability will be evaluated by continuously monitoring system performance under varying loads to ensure it maintains high performance and scalability. The effectiveness of sharding, replication, and load balancing strategies will be assessed by simulating large traffic using stress testing software such as JMeter, as well using tools like Grafana or Prometheus to monitor throughput such as CPU usage, memory usage, and request rates.
Word Count: 1107 words