GitHub is used by millions of users to host and share the codes. It’s fantastic, but sometimes you/developers/code owners can accidentally dump confidential information in a public repository, which can be a disaster. There are many incidents where confidential data was leaked on GitHub. You can’t eliminate human error but can take action to reduce that. How do you ensure your repository doesn’t contain a password or key? Simple answer – don’t store. As a best practice, one should use secret management software to store all the sensitive information. But in reality, you can’t control other people’s behavior if working in a team. BTW, if you use Git to initialize and deploy your application, it creates .git folder, and if accessible over the Internet, it may expose the sensitive confirmation. Which you don’t want and should consider to block .git URI. Thanks to the following solution, which helps you to find mistakes in your repository.
Gittyleaks
A python-based free utility on finding words like a user, password, email in a string, config, or JSON formats.
Gittyleaks can be installed using pip and have an option to find suspicious data.
Secrets Scanning
GitHub has secrets scanning feature that scans the repositories to check for accidentally committed secrets. Identifying and fixing such vulnerabilities helps to prevent attackers from finding and fraudulently using the secrets to access services with the compromised account’s privileges.
Key highlights include;
GitHub helps to scan and detect the secrets hidden accidentally, enabling you to prevent data leaks and compromises. It can scan public and private repositories while alerting service providers who had issued the detected secrets for mitigation. For private repositories, GitHub alerts the organization owners or administrators and also displays a warning in the repository.
Git Secrets
Released by AWS Labs, as you can guess by the name – it scans for the secrets. Git Secrets would help prevent committing AWS keys by adding a pattern. It let you scan for a file or folder recursively. If you suspect your project repository may contain AWS key, then this would be an excellent place to start.
Repo Supervisor
Repo Supervisor by Auth0 lets you find misconfiguration, password, etc.
It’s a serverless tool that can be installed inside a Docker container or any server using NPM.
Truffle Hog
One of the popular utility to find secrets everywhere, including branches, commit history. Truffle Hog search using regex and entropy, and the result is printed on the screen.
You can install using pip
Git Hound
A git plugin based on GO, Git Hound, helps prevent sensitive data from being committed in a repository against PCRE (Perl Compatible Regular Expressions). It’s available in a binary version for Windows, Linux, Darwin, etc. Useful if you don’t have GO installed.
Gitrob
Gitrob makes it easy for you to analyze the finding on a web interface. It’s based on Go, so that’s a prerequisite.
Watchtower
AI-powered scanner to detect API keys, secrets, sensitive information. Watchtower Radar API lets you integrate with GitHub public or private repository, AWS, GitLab, Twilio, etc. The scan results are available on a web interface or CLI output.
Repo Security Scanner
Repo security scanner is a command-line tool that helps you discover passwords, tokens, private keys, and other secrets accidentally committed to the git repo when pushing sensitive data. This is an easy-to-use tool that investigates the entire repo history and provides the scan results quickly. The scanning enables you to identify and address the potential security vulnerabilities that exposed secrets introduces in the open-source software.
GitGuardian
GitGuardian is a tool that enables developers, security, and compliance teams to monitor the GitHub activity in real-time and identify vulnerabilities due to exposed secrets like API tokens, security certificates, database credentials, etc.
GitGuardian allows the teams to enforce security policies in private and public code and other data sources. GitGuardian major features are;
The tool helps to find sensitive information such as secrets in the private source code, Identify and fix sensitive data leaks on public GitHub, It is an effective, transparent, and easy to set up secret detection tool. Wider coverage and comprehensive database to cover almost any sensitive information at risk Sophisticated pattern matching techniques that improve the discovery process and effectiveness.
Conclusion I hope this gives you an idea of finding sensitive data in the GitHub repository. If you are using AWS, then check out this article to scan AWS security and misconfiguration.