Automating password rotation for Oracle databases

Maris Elsins

May 26, 2017

Tags: Oracle, Automation, Ansible, Technical Track, Rman

Password rotation is not the most exciting task in the world, and that's exactly why it's a perfect candidate for automation. Automating routine tasks like this are good for everyone - DBAs can work on something that's more exciting, companies save costs as less time is spent on changing the passwords, and there's no place for human error, either. At Pythian, we typically use Ansible for task automation, and I like it mainly because of its non-intrusive configuration (no agents need to be installed on the target servers), and its scalability (tasks are executed in parallel on the target servers). This post will briefly describe how I automated password rotation for oracle database users using Ansible.

Overview

This blog post is not an intro to what is Ansible and how to use it, but it's rather an example of how a simple task can be automated using Ansible in a way that's scalable, flexible and easily reusable, and also provides the ability for other tasks to pick up the new passwords from a secure password store.

Scalability - I'd like to take advantage of Ansible's ability of executing tasks on multiple servers at the same time. For example, in a large environments of tens or hundreds of machines, a solution that executes password change tasks serially would not be suitable. This would be an example of a "serial" task (it's not a real thing, but just an illustration that it "hardcodes" a few "attributes" (environment file, the username and the hostname), and creating a separate task for every user/database you'd want to change the password for would be required:

- hosts: ora-serv01 remote_user: oracle tasks: - name: change password for SYS shell: | . TEST1.env && \ sqlplus / as sysdba @change_pasword.sql SYS \ \"\"
Flexible - I want to be able to adjust the list of users for which the passwords are changed, and the list of servers/databases that the user passwords are changed for in a simple way, that doesn't include changing the main task list.
Reusable - this comes together with flexibility. The idea is that the playbook would be so generic, that it wouldn't require any changes when it's implemented in a completely separate environment (i.e. for another client of Pythian)
Secure password store - the new passwords are to be generated by the automated password rotation tool, and a method of storing password securely is required so that the new passwords could be picked up by the DBAs, application owners or the next automated task that would reconfigure the application

The implementation

Prerequisites

I chose to do the implementation using Ansible 2.3, because it introduces the passwordstore lookup, which enables interaction with the pass utility (read more about it in Passwordstore.org). pass is very cool. It store passwords in gpg-encrypted files, and it can also be configured to automatically update the changes to a git repository, which relieves us of the headache of password distribution. The password can be retrieved from git on the servers that need the access to the new passwords. Ansible 2.3 runs on python 2.6, unfortunately, the passwordstore lookup requires Python 2.7, which can be an issue if the control host for Ansible runs on Oracle Linux 6 or RHEL 6, as they don't provide Python 2.7 in the official yum repositories. Still, there are ways of getting it done, and I'll write another blog post about it. So, what we'll need is:

Ansible 2.3
jmespath plugin on Ansible control host (pip install jmespath)
jinja2 plugin on Ansible control host (I had to update it using pip install -U jinja2 in few cases)
Python 2.7 (or Python 3.5)
pass utility

The Playbook

This is the whole list of files that are included in the playbook:

./chpwd.yml ./inventory/hosts ./inventory/orcl1-vagrant-private_key ./inventory/orcl2-vagrant-private_key ./roles/db_users/files/change_password.sql ./roles/db_users/files/exists_user.sql ./roles/db_users/defaults/main.yml ./roles/db_users/tasks/main.yml [/code]

Let's take a quick look at all of them:

./chpwd.yml - is the playbook and (in this case) it's extremely simple as I want to run the password change against all defined hosts: $ cat ./chpwd.yml --- - name: password change automation hosts: all roles: - db_users
./inventory/hosts, ./inventory/orcl1-vagrant-private_key, ./inventory/orcl2-vagrant-private_key - these files define the hosts and the connectivity. In this case we have 2 hosts - orcl1 and orcl2, and we'll connect to vagrant user using the private keys.
$ cat ./inventory/hosts [orahosts] orcl1 ansible_host=127.0.0.1 ansible_port=2201 ansible_ssh_private_key_file=inventory/orcl1-vagrant-private_key ansible_user=vagrant orcl2 ansible_host=127.0.0.1 ansible_port=2202 ansible_ssh_private_key_file=inventory/orcl2-vagrant-private_key ansible_user=vagrant
./roles/db_users/files/change_password.sql - A sql script that I'll execute on the database to change the passwords. It takes 2 parameters the username and the password:
$ cat ./roles/db_users/files/change_password.sql set ver off pages 0 alter user &1 identified by "&2"; exit;
./roles/db_users/files/exists_user.sql - A sql script that allows verifying the existence of the users. It takes 1 argument - the username. It outputs "User exists." when the user is there, and "User {username} does not exist." - when it's not.
$ cat ./roles/db_users/files/exists_user.sql set ver off pages 0 select 'User exists.' from all_users where username=upper('&1') union all select 'User '||upper('&1')||' does not exist.' from (select upper('&1') from dual minus select username from all_users); exit;
./roles/db_users/defaults/main.yml - is the default file for the db_users role. I use this file to define the users for each host and database for which the passwords need to be changed:
$ cat ./roles/db_users/defaults/main.yml --- db_users: - name: TEST1 host: orcl1 env: ". ~/.bash_profile && . ~/TEST1.env > /dev/null" pwdstore: "orcl1/TEST1/" os_user: oracle become_os_user: yes users: - dbsnmp - system - name: TEST2 host: orcl2 env: ". ~/.bash_profile && . ~/TEST2.env > /dev/null" pwdstore: "orcl2/TEST2/" os_user: oracle become_os_user: yes users: - sys - system - ctxsys - name: TEST3 host: orcl2 env: ". ~/.bash_profile && . ~/TEST3.env > /dev/null" pwdstore: "orcl2/TEST3/" os_user: oracle become_os_user: yes users: - dbsnmp
In this data structure, we define everything that's needed to be known to connect to the database and change the passwords. each entry to the list contains the following data:
- name - just a descriptive name of the entry in this list, normally it would be the name of the database that's described below.
- host - the host on which the database resides. It should match one of the hosts defined in ./inventory/hosts.
- env - how to set the correct environment to be able to connect to the DB (currently it requires sysdba connectivity).
- pwdstore - the path to the folder in the passwordstore where the new passwords will be stored.
- os_user and become_os_user - these are used in case sudo to another user on the target host is required. In a typical configuration, I connect to the target host using a dedicated user for ansible, and then sudo to the DB owner. if ansible connects to the DB onwer directly, then become_os_user should be set to "no".
- users - this is the list of all users for which the passwords need to be changed.
As you see, this structure greatly enhances the flexibility and reusability, because adding new databases, hosts or users to the list would be done by a simple change to the "db_users:" structure in this defaults file. In this example, dbsnmp and system passwords are rotated for TEST1@orcl1, sys, system and ctxsys passwords are rotated for TEST2@orcl2, and dbsnmp on TEST3@orcl2
./roles/db_users/tasks/main.yml - this is the task file of the db_users role. The soul of the playbook and the main part that does the password change depending on the contents in the defaults file described above. Instead of pasting the whole at once, I'll break it up task by task, and will provide some comments about what's being done.
- populate host_db_users - This task simply filters the whole db_users data structure that's defined in the defaults file, and creates host_db_users fact with only the DBs that belong to the host the task is currently run on. Using the ansible "when" conditional would also be possible to filter the list, however in such case there's a lot of "skipped" entries displayed when the task is executed, so I prefer filtering the list before it's even passed to the Ansible task.
  --- - name: populate host_db_users set_fact: host_db_users="[]"
- create directory for target on db hosts - for each unique combination of os_user and become_os_user on the target host, and "ansible" directly is created. A json_query is used here, to filter just the os_user and become_os_user attributes that are needed. It would also work with with_items: "", but in that case, the outputs become cluttered as the attributes are displayed during the execution.
  
  - name: create directory for target on db hosts file: path: "ansible" state: directory become_user: "{{item.os_user}}" become: "{{item.become_os_user}}" with_items: "[]"
- copy sql scripts to db_hosts - the missing scripts are copied from Ansible control host to the target "ansible" directories. "with_nested" is the method to create a loop in Ansible.
  
  - name: copy sql scripts to db_hosts copy: src="" dest=ansible/ mode=0644 become_user: "" become: "" with_nested: - "[]" - ['files/change_password.sql','files/exists_user.sql']
- verify user existence - I'm using a shell module to execute the sql script after setting the environment. The outputs are collected in "exists_output" variable. This task will not fail and will not show as "changed" because of failed_when and changed_when settings of "false".
  
  - name: verify user existence shell: | && \ sqlplus -S / as sysdba \ @ansible/exists_user.sql register: exists_output become_user: "" become: "" with_subelements: - "" - users failed_when: false changed_when: false
- User existence results - this task will fail when any of the users didn't exist, and will display which user it was. This is done in a separate task to produce cleaner output, and in case it's not wanted to fail if any of the users don't exist (continue to change passwords for the existing users), this task can simply be commented or the "failed_when: false" can be uncommented.
  - name: User existence results fail: msg="{{item}}" with_items: "[]" #failed_when: false
- generate and change the user passwords - finally, this is the task that actually changes the passwords. The successful password change is detected by checking the output from the sqlscript, which should produce "User altered." The rather complex use of lookups is there for a reason: the passwordstore lookup can also generate passwords, but it's not possible to define the character classes that the new password should contain, however the "password" lookup allows defining these. Additionally, the 1st character is generated only containing "ascii_letters", as there are usually some applications that "don't like" passwords that start with numbers (this is why generating the 1st letter of the password is separated from the remaining 11 characters. And lastly, the "passwordstore" lookup is used with the "userpass=" parameter to pass and store the generated password into the passwordstore (and it also keeps the previous passwords). This part could use some improvement as in some cases different rules for the generated password complexity may be required. The password change outputs are recorded in "change_output" that's checked in the last task.
  
  - name: generate and change the user passwords shell: | && \ sqlplus -S / as sysdba \ @ansible/change_password.sql \ \"\" register: change_output become_user: "" become: "" with_subelements: - "" - users failed_when: false changed_when: "'User altered.' in change_output.stdout"
- Password change errors - The "change_output" data are verified here, and failed password changes are reported.
  # fail if the password change failed. - name: Password change errors fail: msg="{{item}}" with_items: "[]"

It really works!

Now, when you know how it's built - it's time to show how it works! Please pay attention to the following:

The password store is empty at first
The whole password change playbook completes in 12 seconds
The tasks on both hosts are executed in parallel (see the order of execution feedback for each task)
The passwordstore contains the password entries after the playbook completes, and they can be retrieved by using the pass command

$ pass Password Store $ time ansible-playbook -i inventory/hosts chpwd.yml PLAY [pasword change automation] ******************************************************* TASK [Gathering Facts] ***************************************************************** ok: [orcl1] ok: [orcl2] TASK [db_users : populate host_db_users] *********************************************** ok: [orcl1] ok: [orcl2] TASK [db_users : create directory for target on db hosts] ****************************** changed: [orcl1] => (item={'become_os_user': True, 'os_user': u'oracle'}) changed: [orcl2] => (item={'become_os_user': True, 'os_user': u'oracle'}) TASK [db_users : copy sql scripts to db_hosts] ***************************************** changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql']) changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql']) changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql']) changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql']) TASK [db_users : verify user existance] ************************************************ ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys')) ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp')) ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system')) ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system')) ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys')) ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp')) TASK [db_users : User existance results] *********************************************** TASK [db_users : generate and change the user passwords] ******************************* changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys')) changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp')) changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system')) changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system')) changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys')) changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST3/', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp')) TASK [db_users : Password change errors] *********************************************** PLAY RECAP ***************************************************************************** orcl1 : ok=6 changed=3 unreachable=0 failed=0 orcl2 : ok=6 changed=3 unreachable=0 failed=0 real 0m12.418s user 0m8.590s sys 0m3.900s $ pass Password Store |-- orcl1 | |-- TEST1 | |-- dbsnmp | |-- system |-- orcl2 |-- TEST2 | |-- ctxsys | |-- sys | |-- system |-- TEST3 |-- dbsnmp $ pass orcl1/TEST1/system HDecEbjc6xoO lookup_pass: First generated by ansible on 26/05/2017 14:28:50

Conclusions

For past 2 months I've been learning Ansible and trying it for various DBA tasks. It hasn't always been a smooth ride, as I had to learn quite a lot, because I wasn't exposed much to beasts like jinja2, json_query, YAML, python (very handy for troubleshooting) and Ansible itself before. I feel that my former PL/SQL coder's experience had created some expectations from Ansible, that turned out not to be true. The biggest challenges to me were getting used to the linear execution of the playbook (while with PL/SQL I can call packages, functions, etc. to process the data "outside" the main linear code line), and the lack of execution feedback, because one has to learn creating Ansible tasks in a way that they either succeed or fail (no middle states like 'this is a special case - process it differently'), as well as the amount of visual output is close to none - which does make sense to some degree, it's "automation" after all, right? Nobody should be watching :)

A separate struggle for me was working with the complex data structure that I created for storing the host/database/user information. It's a mix of yaml "dictionary" and "list", and it turned out to be difficult to process it in a way I wanted - this is why I used the json_query at times (although not in a very complex way in this case). There are probably simpler ways I didn't know of (didn't manage finding), and I'd be glad if you'd let me know of possible improvements or even other approaches to such tasks that you have worked on and implemented. Despite all the complaining above, I think it's really worth investing time in automating tasks like this, it really works and once done it doesn't require much attention. Happy Automating!

Insight and analysis of technology and business strategy