project-proposal-2025

Sani: Digital Hand Sanitiser

Abstract

AI is everywhere—but so are your secrets. Every time you use AI, you risk exposing sensitive data: names, addresses, passwords, company IP. Judges have been caught using AI on confidential case files, and employees spend hours scrubbing documents just to use AI safely. But even when policies ban it, people still do it—creating massive security risks.

Sani makes AI safe to use by automatically sanitising sensitive data before it reaches third-party models. Its security-first architecture enforces least privilege, ensuring data is only processed where necessary. A modular design isolates processing nodes and lets organizations configure their own sanitization workflows. And with an extensible framework that evolves with emerging threats, Sani adapts to new risks without compromising security.

Author

Name: Benjamin Rose

Student number: 47437870

Functionality

Create or Download a Wash-Guide

Choose from pre-made wash-guides or create a custom one.
Define what types of sensitive information need sanitisation (e.g., names, dates, addresses).
Store the wash-guide locally for personal use or share it for team-wide use.

Set Up & Configure Sanitisation

Install and configure washing machines on your device or cloud.
Test sanitisation rules before applying them to actual data.
Adjust settings based on risk tolerance and security needs.

Sanitise Data Locally or in the Cloud

Process data on your own device for privacy-sensitive tasks.
Use cloud-based sanitisation for high-volume or intensive processing.
Choose industry-specific processing for compliance with regulations.

Verify Data Integrity with Certificates

Check that each step of the sanitisation process is logged and certified.
Review attached certificates to confirm data authenticity.
Flag or report inconsistencies if verification fails.

Reintroduce Sanitised Data

Restore cleaned data back into workflows or AI processing.
Maintain reversibility where needed for authorised users.
Ensure data integrity remains intact throughout the process.

Monitor & Audit Activity

View logs of past sanitisation runs.
Generate compliance reports for security audits.
Cross-verify logs across different processing nodes.

Scope

Wash-Guide Creation

Create or import wash-guides to define sanitisation rules.
Specify dependencies such as washing machines and data lists.
Store wash-guides locally or share them for collaborative use.

Washing Machines (Core Implementation)

CertBot: Appends a certificate to confirm data has been sanitised. Get this working first.
Funk: Handles type-based sanitisation, such as replacing dates.
Parsnip: Performs context swaps using large data sets (e.g. “Brisbane” with “Sydney”).
MickTagger: Uses a small AI word tagger to assist with sanitisation.

Local-Sani

Manage washing machines and import wash-guides via some GUI.
you dont need to send to and from third-party AI systems. Should be easy to copy out at least.
Verify and sign certificates to ensure data integrity.

Cloud-Sani

Apply additional sanitisation in the cloud using one or more washing machines.
plan for Auto-scale processing you dont need it to work.
Update certificates to maintain verification across different sanitisation stages.

Quality Attributes

Security

Objective: Security is paramount in ensuring that sensitive data remains protected throughout the sanitisation process. By clearly defining the responsibilities between the user, their organisation, and Sani, we ensure a transparent and trusted system.
Why It’s Important:
- Data Integrity: Implementing a certificate-based system ensures data integrity, preventing tampering and verifying that data was properly sanitised.
- Risk Management: Customising security based on each customer’s risk tolerance ensures that the solution is adaptable to different levels of security needs.
- Compliance: By enforcing least privilege principles, Sani helps mitigate potential security breaches during both local and cloud processing, which is vital for compliance with industry regulations.

Extensibility

Objective: Extensibility is critical because sanitisation requirements vary across industries and customer needs. Sani must adapt to these diverse demands without requiring major system changes.
Why It’s Important:
- Future-Proofing: Allowing third-party developers to create new washing machines ensures that Sani can evolve to meet new challenges as they arise.
- Scalability: Extensibility enables the seamless addition of new processing nodes and functionalities, which is vital for keeping up with growing data needs or new sanitisation techniques.
- Industry Flexibility: The ability to easily integrate new processing nodes allows Sani to cater to industries with unique and constantly changing requirements.

Modularity

Objective: Modularity ensures that Sani can be highly customisable, allowing customers to tailor the system to their specific needs. This also promotes both security and extensibility.
Why It’s Important:
- Customisation: The modular design enables independent updates or replacement of individual components, offering flexibility in adapting the system without disrupting the entire architecture.
- Security Benefits: By isolating functions into discrete modules, we reduce the attack surface, ensuring that if one module is compromised, it doesn’t affect the others.
- Easier Integrations: Modularity ensures that new functionalities can be integrated smoothly into the existing framework, allowing Sani to stay relevant as customer needs evolve.

Evaluation

Security

Malory has intercepted a communication channel and edits the prompt sent from between cloud and local Sani. How would local-Sani recognise that the communication has been tampered with? Focus on designing for authenticity/integrity.
Justify your approach to persistency – where are you storing secrets – for how long – who can query and why.
Describe how your architecture uses defence in depth and the principle of least privilege to manage secret maps. How does the system avoid ever making, storing or sharing such maps?
Identify the biggest architectural security risk fundamental to the project. Justify your choice. As a red hat, design a high-level credible threat.

Extensibility

Upload a Wash-Guide to local-Sani that depends on one or more Washing-Machines that are not currently installed. The system must automatically recognise the dependency, pull the image from a public repository, and configure it as a usable Washing-Machine.
Apply a wash-guide that specifies local processing, then to the same system apply a wash-guide that requires both local and external processing. Prove that each workflow only went through the correct location – describe how you could layer more processing nodes, what trade-offs make this approach desirable?

Modularity

Develop a Unit-test suite for each Washing-Machine. Use time and space complexity to evaluate use cases for each machine. Explain the trade-off of when using Parsnip or MickTagger when sanitising proper nouns.
Use the previous data to predict the complexity of larger workflows with multiple processing nodes and Washing-Machines. Prove you can make predictions about system complexity partially based on the sum of complexity of its modules. Why is this important to predict complexity for your customers custom workflows.