A couple of months ago, Microsoft announced two additional command line tools for SQL Server that are mssql-scripter and DBFS. The latter has drawn my attention because it exposes live data from SQL Server DMVs as virtual files in a virtual directory on Linux operating system. Microsoft has probably taken another positive step in the SQL Server’s adoption on Linux. Indeed, in a Linux world, we may get Kernel’s performance and configuration data either directly from the procfs or indirectly by using tools that involve procfs in the background.
DBFS uses the FUSE filesystem module to expose DMVs and according to Microsoft blog, reading data from each concerned DMV file is a live process that ensures to get always fresh data.
[mikedavem@sql server]$ rpm -qR $(rpm -qa | grep dbfs) glibc fuse fuse-devel freetds rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(PayloadIsXz) <= 5.2-1
After installing DBFS, let’s start the tool by using the following command line:
[mikedavem@sql mssql]$ dbfs -m /var/opt/mssql/dmv -c /var/opt/mssql/secrets/dmvtool.conf -v 1: Processing entry for section server in configuration file: SUCCESSFULLY added entry for server server.
The configuration file (-c parameter) stores sensitive information to connect to the SQL Server instance (hostname, user and “insecure” password). You shall consider to protect this file with suitable permissions.
DBFS is a background process by default but you may change the behavior by using -f parameter at the startup.
[mikedavem@sql mssql]$ ps -e | grep dbfs 2673 ? 00:00:00 dbfs
At this stage, the negative point is that to start DBFS manually if the server restarts. I had thought to wrap DBFS in a cron job that will run at the server startup but my preference would be to get the possibility to control DBFS through system as a service.
Exploring the DMV mounted filesystem
In fact, every DMV is exposed in two formats (normal file and JSON file).
[mikedavem@sql server]$ pwd /var/opt/mssql/dmv/server [mikedavem@sql server]$ ls -I "*.json" | wc -l 494
DBFS exposes a lot of DMVs as we notice above. We may also want to filter regarding the category plan described in the BOL by using grep.
For instance, AlwaysOn related DMVs …
[mikedavem@sql server]$ ls -I "*.json" | grep dm_hadr dm_hadr_automatic_seeding dm_hadr_auto_page_repair dm_hadr_availability_group_states dm_hadr_availability_replica_cluster_nodes dm_hadr_availability_replica_cluster_states dm_hadr_availability_replica_states dm_hadr_cluster dm_hadr_cluster_members dm_hadr_cluster_networks dm_hadr_database_replica_cluster_states dm_hadr_database_replica_states dm_hadr_instance_node_map dm_hadr_name_id_map dm_hadr_physical_seeding_stats …
… or database related DMVs
[mikedavem@sql server]$ ls -I "*.json" | grep dm_db dm_db_column_store_row_group_operational_stats dm_db_column_store_row_group_physical_stats dm_db_file_space_usage dm_db_fts_index_physical_stats dm_db_index_usage_stats dm_db_log_space_usage dm_db_mirroring_auto_page_repair dm_db_mirroring_connections dm_db_mirroring_past_actions dm_db_missing_index_details dm_db_missing_index_groups dm_db_missing_index_group_stats dm_db_partition_stats dm_db_persisted_sku_features dm_db_rda_migration_status dm_db_rda_schema_update_status dm_db_script_level dm_db_session_space_usage dm_db_task_space_usage …
The schema name suffix (sys) is not present for DMV related files.
Extracting data from DMV related files
Linux provides very powerful tools to consume data from files like tr, cut, split, sort, join, cat, grep and awk to cite few of them. You may find some examples in Github but I decided to get my own experience by attempting to address some usual DBA queries.
- Extracting data from sys.dm_os_sys_info to retrieve CPU, memory available information on the server as well as the memory manager consumption
We may use either cut command to extract required columns from the dm_os_sys_info file.
[mikedavem@sql server]$ cut -d$'\t' -f3,4,5,7,8,9,15,27 dm_os_sys_info cpu_count hyperthread_ratio physical_memory_kb committed_kb committed_target_kb visible_target_kb scheduler_count virtual_machine_type_desc 4 4 3918848 207160 3914240 3914240 4 HYPERVISOR
But the above output is not very readable and we may want to display the information differently (in column rather than in row). Using awk may be your best solution in this case:
[mikedavem@sql server]$ for ((i=1;i<$(cut -d$'\t' -f3,4,5,7,8,9,15,27 dm_os_sys_info | head -n 1 | wc -w);i++)); do awk '{print $'$i'}' <(cut -d$'\t' -f3,4,5,7,8,9,15,27 dm_os_sys_info) | tr '\n' ' ';echo;done | column -t cpu_count 4 hyperthread_ratio 4 physical_memory_kb 3918848 committed_kb 207296 committed_target_kb 3914240 visible_target_kb 3914240 scheduler_count 4
- Extracting information from sys.databases
A typical query I may see is to retrieve database information including their name, id, state, recovery model and owner. Basically , we have to join two DMVs to get all the requested information: sys.databases and sys.server_principals. In Linux world, you may also use the join command to gather information from different files but it implies to pre-sort each data file first.
But getting the desired output may be challenging. Indeed, firstly we will probably choose the grep / cut tools to extract only rows and columns we want. But using such tools will prevent to keep the column context and getting only column values from the DMV related files may be meaningless in this case as shown below :
[mikedavem@sql server]$ join -1 2 -2 3 -t$'\t' -o 2.1,2.2,2.3,2.4,2.5,1.1 \ <(grep -Ev '^[NT|##]|SERVER_ROLE' server_principals | cut -d$'\t' -f 1,3 | sort -t$'\t' -k2) \ <(cut -d$'\t' -f 1,2,4,14,22 databases | sort -t$'\t' -k3) | column -t model 3 01 ONLINE FULL sa test 6 01 ONLINE FULL sa ApplixEnterprise 5 01 ONLINE SIMPLE sa master 1 01 ONLINE SIMPLE sa msdb 4 01 ONLINE SIMPLE sa tempdb 2 01 ONLINE SIMPLE sa
But preserving the column context may become also your concern if you want to sort column data values. Indeed, let’s say you want to sort the above output by the database_id column value. You will quickly notice that the header file will be included by the sort operator. Definitely not the result we expect in this case. So, in order to meet our requirement, we may use again the very powerful awk command as shown below:
[mikedavem@sql server]$ join --header -1 2 -2 3 -t$'\t' -o 2.1,2.2,2.3,2.4,2.5,1.1 \ <(head -n1 server_principals | cut -d$'\t' -f 1,3;grep -Ev '^[NT|##]|SERVER_ROLE' server_principals | cut -d$'\t' -f 1,3 | sort -t$'\t' -k2) \ <(head -n 1 databases | cut -d$'\t' -f 1,2,4,14,22;cut -d$'\t' -f 1,2,4,14,22 databases | sort -t$'\t' -k3) \ | awk 'NR<2{print $0;next}{print $0 | "sort -k2"}' | column -t name database_id owner_sid state_desc recovery_model_desc name master 1 01 ONLINE SIMPLE sa tempdb 2 01 ONLINE SIMPLE sa model 3 01 ONLINE FULL sa msdb 4 01 ONLINE SIMPLE sa ApplixEnterprise 5 01 ONLINE SIMPLE sa test 6 01 ONLINE FULL sa
We finally managed to display the desired output but my feeling is we worked hard for few results and usually we have to deal with more DMVs and complex join than the previous example in the same query. Furthermore, we often have to aggregate data from DMVs to get relevant results and I may easily imagine the additional efforts to produce the corresponding scripts if we extend the scenarios we have to deal with.
There are probably other tools on Linux to make the task easier but my opinion is that DBFS should include the ability to execute custom queries already used by DBAs day-to-day to go beyond the actual capabilities. We didn’t surface JSON format in this blog. This is another way to consume data from DMV related files but in my opinion, it targets more the developers than the DBAs audience.
I noticed that I was not the first and the only one to think about those enhancements by looking at the DBFS Github and the issues section. Anyway, I like the Microsoft’s initiative to give us the ability to consume DMVs related data from the command line in a Linux world and I’m looking forward the next features about this tool!
By David Barbarin