Importing Data With PowerShell and dbatools

I like to use public datasets for experimentation and presentation demos, especially data that people can easily understand and relate to. For some, keeping them up-to-date was a manual process of downloading files, loading tables, and merging. There are of course many better ways to do this, some of which are more automated than others. I could have simply used PowerShell to call bcp, or even just implemented an insert statement and some loops. Then I found dbatools, which has commands which enable me to do an even better job with far less work – just the way I like it!. Here’s how I now keep my datasets current:

Getting The Data

I’ll be using data from the City of Chicago’s Data Portal. They have a tremendous online resource with lots of public datasets available. One that I really like is their listing of towed vehicles. Any time the city tows or impounds a vehicle, a record gets added here and remains for 90 days. It’s very manageable, with only 10 columns and a few thousand rows. (As an added bonus, you can search for license plates you know and then ask your friends about their experience at the impound lot!)

Chicago’s data portal uses Socrata, which is a very well-documented and easy-to-use tool for exposing data. It has a wonderful API for querying and accessing data, but to keep things simple for this post we’re just going to download a CSV file.

If you’re on the page for a dataset, you can download it by clicking on “Export” on the top right and then selecting “CSV”. To avoid all that, the direct link to download a CSV of this dataset is here. Download it and take a look at what we’ve got using your spreadsheet or text editor of choice (mine is Notepad++).

Loading The Data

We’ve got our data, now let’s load it. I like to load the entire downloaded dataset into a stage table, and then copy new rows I haven’t previously seen into my production table that I query from. Here’s the script to create these tables:

-- CREATE STAGE TABLE

CREATE TABLE [dbo].[TowedVehiclesSTG](

 [TowDate] [date] NOT NULL,

 [Make] [nchar](4) NULL,

 [Style] [nchar](2) NULL,

 [Model] [nchar](4) NULL,

 [Color] [nchar](3) NULL,

 [Plate] [nchar](8) NULL,

 [State] [nchar](2) NULL,

 [TowedToFacility] [nvarchar](75) NULL,

 [FacilityPhone] [nchar](14) NULL,

 [ID] [int] NOT NULL

);

-- CREATE FINAL TABLE

CREATE TABLE [dbo].[TowedVehicles](

 [ID] [int] NOT NULL,

 [TowDate] [date] NOT NULL,

 [Make] [nchar](4) NULL,

 [Style] [nchar](2) NULL,

 [Model] [nchar](4) NULL,

 [Color] [nchar](3) NULL,

 [Plate] [nchar](8) NULL,

 [State] [nchar](2) NULL,

 [TowedToFacility] [nvarchar](75) NULL,

 [FacilityPhone] [nchar](14) NULL,

CONSTRAINT PK_TowedVehicles PRIMARY KEY CLUSTERED (ID)

);

Now for the magic – let’s load some data! The dbatools command that does all the heavy lifting here is called Import-DbaCsvToSql. It loads CSV files into a SQL Server table quickly and easily. As an added bonus, the entire import is within a transaction, so if an error occurs everything gets rolled back. I like to specify my tables and datatypes ahead of time, but if you want to load into a table that doesn’t exist yet, this script will create a table and do its best to guess the appropriate datatype. To use, simply point it at a CSV file and a SQL Server instance, database, and (optionally) a table. It will take care of the rest.

# Load from CSV into staging table

Import-DbaCsvToSql -Csv $downloadFile -SqlInstance InstanceName -Database TowedVehicles -Table TowedVehiclesSTG `

-Truncate -FirstRowColumns

The two parameters on the second line tell the command to truncate the table before loading, and that the first line of the CSV file contains column names.

Now the data has been staged, but since this dataset contains all cars towed over the past 90 days, chances are very good that I already have some of these tows in my production table from a previous download. A simple query to insert all rows from staging into production that aren’t already there will do the trick. This query is run using another dbatools command, Invoke-Sqlcmd2.

# Move new rows from staging into production table

Invoke-Sqlcmd2 -ServerInstance InstanceName -Database TowedVehicles `

-Query "INSERT INTO [dbo].[TowedVehicles]

SELECT

 [ID],

 [TowDate],

 [Make],

 [Style],

 [Model],

 [Color],

 [Plate],

 [State],

 [TowedToFacility],

 [FacilityPhone]

FROM (

 SELECT

 s.*,

 ROW_NUMBER() OVER (PARTITION BY s.ID ORDER BY s.ID) AS n

 FROM [dbo].[TowedVehiclesSTG] s

 LEFT JOIN [dbo].[TowedVehicles] v ON s.ID = v.ID

 WHERE v.ID IS NULL

) a

WHERE a.n = 1"
The ID column uniquely identifies each tow event, and the production table uses it as a primary key, however I have found that occasionally the dataset will contain duplicated rows. The ROW_NUMBER() window function addresses this issue and ensures each ID is attempted to be inserted only once.

Putting it all together

I’ve showed you how simple dbatools makes it to load a CSV file into a table and then run a query to load from staging into production, but the beauty of PowerShell is that it’s easy to do way more than that. I actually scripted this entire process, including downloading the data! You can download the full PowerShell script, along with a T-SQL Script for creating the tables, from my GitHub here.

Happy Data Loading!

This post was cross-posted from Bob’s personal technical blog at bobpusateri.com.


Bob Pusateri presenting at Chicago Suburban SQL Server Users Group

Bob Pusateri from our team is proud to be presenting at the Suburban Chicago SQL Server Users Group on April 17th at 6pm a session called “Locks, Blocks, and Snapshots: Maximizing Database Concurrency.”

Abstract: The ability for multiple processes to query and update a database concurrently has long-been a hallmark of database technology, but this feature can be implemented in many ways. This session will explore the different isolation levels supported by SQL Server and Azure SQL Database, why they exist, how they work, how they differ, and how In-Memory OLTP fits in. Demonstrations will also show how different isolation levels can determine not only the performance, but also the result set returned by a query. Additionally, attendees will learn how to choose the optimal isolation level for a given workload, and see how easy it can be to improve performance by adjusting isolation settings. An understanding of SQL Server’s isolation levels can help relieve bottlenecks that no amount of query tuning or indexing can address – attend this session and gain Senior DBA-level skills on how to maximize your database’s ability to process transactions concurrently.

RSVP for this event today!


Speaking at SQL Saturday Chicago

Heraflux is proud to contribute to this year’s SQL Saturday event in Chicago on March 17. Not only is our own Bob Pusateri one of the primary event coordinators, David Klee is presenting a new session called “Level Up Your Cloud Infrastructure Skills“.

Session abstract: Think infrastructure in the cloud is still just for sysadmins? Think again! As your organization moves into the cloud, infrastructure skills are more important than ever for DBAs to master. Expert knowledge of cloud-related infrastructure will help you maintain performance and availability for databases in the cloud. For example, know what an IOP is? How many does your database consume during a given day? Properly sizing a cloud database depends on your knowledge of this metric. Failure to properly configure storage performance at the time of deployment will slow down your SQL Server considerably. Come learn many of the key cloud infrastructure points that you should master as the DBA role continues to evolve!

Register for this exciting event today, and we look forward to meeting you there!


Heraflux is Presenting at SQLBits

Heraflux is extremely proud to have one of our Solutions Architects – Bob Pusateri – speaking at SQLBits – Europe’s largest SQL Server conference! Bob will be delivering two SQL Server administration sessions on February 23rd and 24th in London.

VLDBs: Lessons Learned

Whoever coined the term “one size fits all” was not a DBA. Very large databases (VLDBs) have different needs from their smaller counterparts, and the techniques for effectively managing them need to grow along with their contents. In this session, join Microsoft Certified Master Bob Pusateri as he shares lessons learned over years of maintaining databases over 20TB in size. This talk will include techniques for speeding up maintenance operations before they start running unacceptably long, and methods for minimizing user impact for critical administrative processes. You’ll also see how generally-accepted best practices aren’t always the best idea for VLDB environments, and how, when, and why deviating from them can be appropriate. Just because databases are huge doesn’t mean they aren’t manageable, attend this session and see for yourself!

SQL Server Administration on Linux

Times are certainly changing with Microsoft’s recent announcement to adopt the Linux operating system with the SQL Server 2017 release, and you should be prepared to support it. But, what is Linux? Why run your critical databases on an unfamiliar operating system? How do I do the basics, such as backing up to a network share or add additional drives for data, logs, and tempdb files?

This introductory session will help seasoned SQL Server DBAs understand the basics of Linux and how it differs from Windows, all the way from basic management to performance monitoring. By the end of the session, you will be able to launch your own Linux-based SQL Server instance on a production ready VM.

We are thrilled to participate and contribute to this incredible event we’ve been watching from afar for years! Join us at Bob’s two sessions and learn from his years of knowledge!


Guest Post at Pure Storage - VVols and SQL Server are a Game Changer

Heraflux is thrilled to have contributed a blog post on how SQL Servers, VVols, and Pure Storage’s unique implementation of VVols combines to make SQL Server DBA jobs better over at Pure’s blog. Pure’s implementation of VVols is simplifying our world and improving our ability to support our businesses.

Read more on it here!

VVols are a Game Changer, and You Should be Excited


SQL Server on Linux Series: Expanding LVM Drives

Next in our SQL Server on Linux series is one important question. On Windows, if you’re about to run out of space, you get your VM admin / storage admin to expand one or more of your drives, and you go to Disk Management and expand the drive with no downtime. How do we accomplish this same task on Linux?

Read more


Calgary SQL Server Users Group Presentation 2017

The addition of virtualization into the datacenter changed everything and gave birth to the cloud, but what is really going on under the hood?

Virtualization and cloud IaaS have a lot of similarities, and in this session we will explore the various layers, such as storage, interconnects, operating systems, the VM construction, all from the standpoint of performance and availability.

Maintaining a working understanding of these layers will make you a stronger database professional. Both on-premises virtualization and cloud will be compared and contrasted.

Bring your questions and learn more about what’s actually under the database!


VMworld 2017 Recap

Heraflux was proud to participate in both VMworld USA and EMEA over the last month, and our enterprise SQL Server content stole the show!

Our VMworld USA experience was simply amazing.

First, David Klee from Heraflux, along with Deji Akomolafe from VMware Corp. presented an all-day boot camp the Saturday before VMworld on SQL Server virtualization, where we presented an incredible amount of tips and tricks on SQL Server-on-VMware performance, availability, and scalability topics.

Both of our general session presentations made the VMworld USA best-of-the-day list for Tuesday, August 29th! They are now available for you to watch and learn from out on YouTube.

Monster VMs (Database Virtualization) with vSphere 6.5: Doing IT Right [VIRT1309BU] with Michael Corey

Performance Tuning and Monitoring for Virtualized Database Servers [VIRT1430BU] with Thomas LaRock

Our panel session with Argenis Fernandez from Pure Storage, Thomas LaRock from Solarwinds, Allan Hirt from SQLHA, and Niran Even-chen from VMware Corp. was also a smash hit.

Thanks Argenis for the great photo!

Special thanks goes to Don Sullivan from VMware Corp. for championing the enterprise database content at these conferences!

Heraflux also announced a partnership with SIOS Corporation for an upcoming add-on for the SIOS iQ product for advanced SQL Server state collection into the iQ machine-learning platform to boost the advanced performance anomaly detection into the application layer in a virtual environment. Stay tuned for more details regarding this exciting development!

David Klee and Sergey Razin, PhD., from SIOS Corporation demonstrating the SQL Server enhancements in SIOS iQ

David Klee and Jon Shields at the SIOS booth in the exhibition hall

David also presented with Argenis Fernandez from Pure Storage in the Pure booth in the exhibition hall on SQL Server storage benchmarking fallacies.

Heraflux also met with our good friends at Rubrik to discuss the impressive state of their SQL Server-integrated backup platform, who won both the VMworld 2017 Best of Show and Gold awards for their incredible platform.

Argenis Fernandez of Pure Storage, David Klee of Heraflux, Biswaroop Palit from Rubrik

Argenis and David also presented in the Intel booth on modern CPU architectures and VMware vNUMA for enterprise SQL Server workloads.

As you can see, our VMworld USA conference was quite the show.

Just one week later, Heraflux participated in VMworld EMEA in Barcelona, Spain.

Just like in Las Vegas, all of our SQL Server content made the featured videos list! They are now available for you to watch and learn from out on the VMworld EMEA featured videos list.

Monster VMs (Database Virtualization) with vSphere 6.5: Doing IT Right [VIRT1309BU] with Michael Corey

Performance Tuning and Monitoring for Virtualized Database Servers [VIRT1430BU] with Thomas LaRock

David also presented a session for the vBrownBag TechTalks series called vNUMA and You.

We even held an impromptu SQL Server architect open Q&A in the VMvillage to help folks successfully improve their SQL Server virtualization experience.

The announcements from VMware at both of these shows were a bit mixed, in our opinion. The announcement of the general availability for VMware Cloud on Amazon AWS is an interesting shift in strategy. Pivotal Container Service shows VMware’s commitment to containers and workload portability, which to us is a sign of the direction of VMware Corp. On-premises virtualization is generally seen as the new beige at this point, and the unified direction of VMware, based on the messaging from these shows, demonstrates that VMware is shifting to attempt to be the conduit between the various platforms for on-prem virtualization and public cloud providers. The jury is still out on how well the industry will adopt and embrace this shift, but we at Heraflux are quite interested to know your thoughts!

We’ll see you at the next one!


Q&A Session with SQL Server on VMware Experts at VMworld EMEA

Greetings everyone! We had so many people stay after our SQL Server on VMware sessions at VMworld USA to ask great questions, I thought I’d extend an offer to everyone at the European show coming up in a week! Let’s talk SQL Server! We are going to crash the VMvillage hang space on Tuesday, September 12 at 3pm local time to hold an open-ended SQL Server on VMware Q&A session. This unscripted and unofficial session is for you, has no official agenda, and marketing is strictly off-limits.

Our illustrious experts include:


Michael Corey, Founder – Corey and Associates

Thomas LaRock, Head Geek – Solarwinds

Argenis Fernandez, Principal Architect – Pure Storage

Niran Even Chen, Double VCDX, VMware Corp.

… and me!

Mark your calendars, and bring your questions, specific challenges, or discussion points, and let’s do this! Anything goes – performance, availability, setup and configuration, conflicts between best practices for SQL Server and VMware, anything!


SQL Server on VMware VMworld USA Videos Now On YouTube

We just saw that our two SQL Server on VMware sessions that David Klee co-presented on Tuesday at VMworld USA, both of which made the top 10 sessions of the day list at the conference, are now on YouTube! They are free to watch for everyone!

For those of you with virtualized SQL Servers, regardless of platform and cloud or on-premises, we urge you to watch these two sessions at your convenience.

Monster VMs (Database Virtualization) with vSphere 6.5: Doing IT Right (VIRT1309BU) (co-presented with Michael Corey)

Performance Tuning and Monitoring for Virtualized Database Servers (VIRT1430BU) (co-presented with Thomas LaRock)