Running Talisman CLI in the GitLab CI Servers

5 min readJul 11, 2020

Talisman is an open source tool developed and maintained by ThoughtWorks Technologies. The tool is a language independent scanner that scans the code in a git repository for potential passwords and other confidential information that may be checked in as part of the commits in git. More on talisman here.

Although talisman works predominantly by running git hooks that gets triggered on git commit or push action on the developer machines, they also provide a way to run the tool on the command line with the help of the Talisman CLI.

Running Talisman CLI on Gitlab CI

The code and the sample implementation used in the blog can be found in the gitlab repository below:

PraveenMathew92 / talisman-demo

Demo project that shows the different ways to run Talisman CLI (https://github.com/thoughtworks/talisman) on as a job…

gitlab.com

Please feel free to fork it. Contributions are welcome 🤠

Setup the CI Pipeline

In Gitlab, the file under the name .gitlab-ci.yml defined in the root of the project defines the jobs and the stages that the gitlab pipeline runs. More on the gitlab ci yml file here

Create the file .gitlab-ci.yml in the root of the repository if not already created and add the following to the file.

stages:
  - talisman

This creates a stage of the name talisman in the CI pipeline.

Next add the job of the name Talisman.

Talisman:
  stage: talisman
  before_script:
    - mkdir ~/.talisman
    - cd ~/.talisman 
    - curl -L -O https://github.com/thoughtworks/talisman/releases/download/v1.3.0/talisman_linux_386
    - chmod +x talisman_linux_386

The job, Talisman would run in the talisman stage of the Gitlab CI Pipelines. Before running the Talisman CLI on the repository, it needs to be installed on the CI server. This is done by the before_script parameter.

Talisman has a set of machine dependent binaries (full list here). For the Gitlab CI servers at the time of this writing requires the one fetched from https://github.com/thoughtworks/talisman/releases/download/v1.3.0/talisman_linux_386. The binary is fetched and installed in the .talisman folder in the HOME (~) of the CI server.

Run the installed binary

Talisman:
  stage: talisman
  before_script:
    - mkdir ~/.talisman
    - cd ~/.talisman 
    - curl -L -O https://github.com/thoughtworks/talisman/releases/download/v1.3.0/talisman_linux_386
    - chmod +x talisman_linux_386
    - cd -
  script:
    - ~/.talisman/talisman_linux_386 --scan

Once the binary is installed, run talisman in the root of the repository with the command

~/.talisman/talisman_linux_386 --scan.

The scan parameter of the command scans the git commit history for potential secrets and generates the report in the json format in a newly created folder talisman_reports/data.

The script parameter may be used to run talisman. Please make sure to switch back to the root directory before running the script.

Printing the Output of the Scan

The output generated by the script may be made available for download with the artifact parameter. Also the report may be printed to the job logs with:

python -m json.tool talisman_reports/data/report.json

Python’s json.tool module would format the json before printing and so is preferred over traditional commands such as cat.

Overview of the Gitlab CI YML

With that we have the FINAL .gitlab-ci.yml (src)

stages:
  - talisman

Talisman:
  stage: talisman
  before_script:
    - mkdir ~/.talisman
    - cd ~/.talisman 
    - curl -L -O https://github.com/thoughtworks/talisman/releases/download/v1.3.0/talisman_linux_386
    - chmod +x talisman_linux_386
    - cd -
  script:
    - ~/.talisman/talisman_linux_386 --scan
  only:
    - master
  after_script:
    - python -m json.tool talisman_reports/data/report.json
  artifacts:
    when: always
    paths:
      - talisman_reports

On running the above job on a branch with no secrets checked in like the master branch, it passes. <Job logs>

In case the repo had secrets checked like the sensitive-information-talisman-scan branch, the job would have failed as shown in the job logs

Talisman HTML reporting

Apart from the json output, talisman can also provide an HTML output. The results of the script may be shared with non-technical stakeholders if the output is HTML.

To get the result in HTML, run the CLI with the scanWithHtml parameter. It would however require some extra configurations to the before_script. Also the scanWithHtml parameter outputs to the talisman_html_report directory unlike the scan parameter.

The .gitlab-ci.yml file to make the Talisman job output the report in HTML.

src: the sensitive-information-talisman-html-reporting branch

stages:
  - talisman

Talisman:
  stage: talisman
  before_script:
    - mkdir ~/.talisman
    - curl https://github.com/jaydeepc/talisman-html-report/archive/v1.3.zip -o ~/.talisman/talisman_html_report.zip -J -L
    - cd ~/.talisman && unzip talisman_html_report.zip -d . && mv talisman-html-report-1.3 talisman_html_report && rm talisman_html_report.zip
    - curl -L -O https://github.com/thoughtworks/talisman/releases/download/v1.3.0/talisman_linux_386
    - chmod +x talisman_linux_386
    - cd -
  script:
    - ~/.talisman/talisman_linux_386 --scanWithHtml
  artifacts:
    when: always
    paths:
      - talisman_html_report/

Once the job is run, the artifacts would be available for download. (The artifacts for repo used in this blog may be downloaded from here). Once downloaded and extracted, start a server to host the report.

Here we are using the python SimpleHTTPServer module to start the server.

cd /Downloads/talisman_html_report
python -m SimpleHTTPServer 8080

After starting the server, the HTML report can viewed on the http://localhost:8080/ of the machine in a browser.

Caveats

Talisman CLI in the pipelines should only be used as a safety net. It is advised to have the talisman git hooks installed in the developer workstations as it better to prevent the secrets from leaving the workstations in the first place.

Also talisman CLI at the time of this writing runs the scan on the entire repository including the previous commits. For legacy systems and long running projects, this might be time consuming and so the Talisman job should be run as a separate stage in parallel.