Importing Data With PowerShell and dbatools

I like to use public datasets for experimentation and presentation demos, especially data that people can easily understand and relate to. For some, keeping them up-to-date was a manual process of downloading files, loading tables, and merging. There are of course many better ways to do this, some of which are more automated than others. I could have simply used PowerShell to call bcp, or even just implemented an insert statement and some loops. Then I found dbatools, which has commands which enable me to do an even better job with far less work – just the way I like it!. Here’s how I now keep my datasets current:

Getting The Data

I’ll be using data from the City of Chicago’s Data Portal. They have a tremendous online resource with lots of public datasets available. One that I really like is their listing of towed vehicles. Any time the city tows or impounds a vehicle, a record gets added here and remains for 90 days. It’s very manageable, with only 10 columns and a few thousand rows. (As an added bonus, you can search for license plates you know and then ask your friends about their experience at the impound lot!)

Chicago’s data portal uses Socrata, which is a very well-documented and easy-to-use tool for exposing data. It has a wonderful API for querying and accessing data, but to keep things simple for this post we’re just going to download a CSV file.

If you’re on the page for a dataset, you can download it by clicking on “Export” on the top right and then selecting “CSV”. To avoid all that, the direct link to download a CSV of this dataset is here. Download it and take a look at what we’ve got using your spreadsheet or text editor of choice (mine is Notepad++).

Loading The Data

We’ve got our data, now let’s load it. I like to load the entire downloaded dataset into a stage table, and then copy new rows I haven’t previously seen into my production table that I query from. Here’s the script to create these tables:

-- CREATE STAGE TABLE

CREATE TABLE [dbo].[TowedVehiclesSTG](

 [TowDate] [date] NOT NULL,

 [Make] [nchar](4) NULL,

 [Style] [nchar](2) NULL,

 [Model] [nchar](4) NULL,

 [Color] [nchar](3) NULL,

 [Plate] [nchar](8) NULL,

 [State] [nchar](2) NULL,

 [TowedToFacility] [nvarchar](75) NULL,

 [FacilityPhone] [nchar](14) NULL,

 [ID] [int] NOT NULL

);

-- CREATE FINAL TABLE

CREATE TABLE [dbo].[TowedVehicles](

 [ID] [int] NOT NULL,

 [TowDate] [date] NOT NULL,

 [Make] [nchar](4) NULL,

 [Style] [nchar](2) NULL,

 [Model] [nchar](4) NULL,

 [Color] [nchar](3) NULL,

 [Plate] [nchar](8) NULL,

 [State] [nchar](2) NULL,

 [TowedToFacility] [nvarchar](75) NULL,

 [FacilityPhone] [nchar](14) NULL,

CONSTRAINT PK_TowedVehicles PRIMARY KEY CLUSTERED (ID)

);

Now for the magic – let’s load some data! The dbatools command that does all the heavy lifting here is called Import-DbaCsvToSql. It loads CSV files into a SQL Server table quickly and easily. As an added bonus, the entire import is within a transaction, so if an error occurs everything gets rolled back. I like to specify my tables and datatypes ahead of time, but if you want to load into a table that doesn’t exist yet, this script will create a table and do its best to guess the appropriate datatype. To use, simply point it at a CSV file and a SQL Server instance, database, and (optionally) a table. It will take care of the rest.

# Load from CSV into staging table

Import-DbaCsvToSql -Csv $downloadFile -SqlInstance InstanceName -Database TowedVehicles -Table TowedVehiclesSTG `

-Truncate -FirstRowColumns

The two parameters on the second line tell the command to truncate the table before loading, and that the first line of the CSV file contains column names.

Now the data has been staged, but since this dataset contains all cars towed over the past 90 days, chances are very good that I already have some of these tows in my production table from a previous download. A simple query to insert all rows from staging into production that aren’t already there will do the trick. This query is run using another dbatools command, Invoke-Sqlcmd2.

# Move new rows from staging into production table

Invoke-Sqlcmd2 -ServerInstance InstanceName -Database TowedVehicles `

-Query "INSERT INTO [dbo].[TowedVehicles]

SELECT

 [ID],

 [TowDate],

 [Make],

 [Style],

 [Model],

 [Color],

 [Plate],

 [State],

 [TowedToFacility],

 [FacilityPhone]

FROM (

 SELECT

 s.*,

 ROW_NUMBER() OVER (PARTITION BY s.ID ORDER BY s.ID) AS n

 FROM [dbo].[TowedVehiclesSTG] s

 LEFT JOIN [dbo].[TowedVehicles] v ON s.ID = v.ID

 WHERE v.ID IS NULL

) a

WHERE a.n = 1"
The ID column uniquely identifies each tow event, and the production table uses it as a primary key, however I have found that occasionally the dataset will contain duplicated rows. The ROW_NUMBER() window function addresses this issue and ensures each ID is attempted to be inserted only once.

Putting it all together

I’ve showed you how simple dbatools makes it to load a CSV file into a table and then run a query to load from staging into production, but the beauty of PowerShell is that it’s easy to do way more than that. I actually scripted this entire process, including downloading the data! You can download the full PowerShell script, along with a T-SQL Script for creating the tables, from my GitHub here.

Happy Data Loading!

This post was cross-posted from Bob’s personal technical blog at bobpusateri.com.


Bob Pusateri presenting at Chicago Suburban SQL Server Users Group

Bob Pusateri from our team is proud to be presenting at the Suburban Chicago SQL Server Users Group on April 17th at 6pm a session called “Locks, Blocks, and Snapshots: Maximizing Database Concurrency.”

Abstract: The ability for multiple processes to query and update a database concurrently has long-been a hallmark of database technology, but this feature can be implemented in many ways. This session will explore the different isolation levels supported by SQL Server and Azure SQL Database, why they exist, how they work, how they differ, and how In-Memory OLTP fits in. Demonstrations will also show how different isolation levels can determine not only the performance, but also the result set returned by a query. Additionally, attendees will learn how to choose the optimal isolation level for a given workload, and see how easy it can be to improve performance by adjusting isolation settings. An understanding of SQL Server’s isolation levels can help relieve bottlenecks that no amount of query tuning or indexing can address – attend this session and gain Senior DBA-level skills on how to maximize your database’s ability to process transactions concurrently.

RSVP for this event today!


Free Webinar - DBAs vs. SysAdmins in Cloud Availability

SIOS and Heraflux are proud to host a joint free webinar entitled “DBAs vs. SysAdmins in Cloud Availability” on Thursday, April 19th, at 1pm Eastern. This webinar is hosted by MSSQL Tips.

Database and system administrators have historically had different perspectives on many topics such as high availability, disaster recovery, and performance tuning. Since one silo generally does not have full visibility into the other silos, the age-old shouting match exists when an availability challenge occurs. “Your systems must have the issue, not mine!” is a constant theme during these situations. However, moving these critical systems to the cloud presents some new challenges. Availability becomes an even more critical topic, as outages can occur more randomly than with on-premises systems, and the two sides must work more closely to achieve system availability that meets their organization’s SLAs. Come learn tips on how to work with your system administrators to achieve a higher level of ability for your critical SQL Servers in the cloud.

Register today for this exciting interactive webinar!


Mentioned in SIOS Application Trends 2018 Post

We were recently mentioned in a SIOS blog post discussing how applications, including our favorite – Microsoft SQL Server – are moving to the public cloud in droves. The world is expanding into the cloud – maybe not moving everything all at once, but definitely expanding in to the cloud. We are thrilled to be a part of this migration, and eager to work with companies like SIOS in their push to help organizations simplify their environments and save money! Check it out!


Speaking at SQL Saturday Chicago

Heraflux is proud to contribute to this year’s SQL Saturday event in Chicago on March 17. Not only is our own Bob Pusateri one of the primary event coordinators, David Klee is presenting a new session called “Level Up Your Cloud Infrastructure Skills“.

Session abstract: Think infrastructure in the cloud is still just for sysadmins? Think again! As your organization moves into the cloud, infrastructure skills are more important than ever for DBAs to master. Expert knowledge of cloud-related infrastructure will help you maintain performance and availability for databases in the cloud. For example, know what an IOP is? How many does your database consume during a given day? Properly sizing a cloud database depends on your knowledge of this metric. Failure to properly configure storage performance at the time of deployment will slow down your SQL Server considerably. Come learn many of the key cloud infrastructure points that you should master as the DBA role continues to evolve!

Register for this exciting event today, and we look forward to meeting you there!


Heraflux is Presenting at SQLBits

Heraflux is extremely proud to have one of our Solutions Architects – Bob Pusateri – speaking at SQLBits – Europe’s largest SQL Server conference! Bob will be delivering two SQL Server administration sessions on February 23rd and 24th in London.

VLDBs: Lessons Learned

Whoever coined the term “one size fits all” was not a DBA. Very large databases (VLDBs) have different needs from their smaller counterparts, and the techniques for effectively managing them need to grow along with their contents. In this session, join Microsoft Certified Master Bob Pusateri as he shares lessons learned over years of maintaining databases over 20TB in size. This talk will include techniques for speeding up maintenance operations before they start running unacceptably long, and methods for minimizing user impact for critical administrative processes. You’ll also see how generally-accepted best practices aren’t always the best idea for VLDB environments, and how, when, and why deviating from them can be appropriate. Just because databases are huge doesn’t mean they aren’t manageable, attend this session and see for yourself!

SQL Server Administration on Linux

Times are certainly changing with Microsoft’s recent announcement to adopt the Linux operating system with the SQL Server 2017 release, and you should be prepared to support it. But, what is Linux? Why run your critical databases on an unfamiliar operating system? How do I do the basics, such as backing up to a network share or add additional drives for data, logs, and tempdb files?

This introductory session will help seasoned SQL Server DBAs understand the basics of Linux and how it differs from Windows, all the way from basic management to performance monitoring. By the end of the session, you will be able to launch your own Linux-based SQL Server instance on a production ready VM.

We are thrilled to participate and contribute to this incredible event we’ve been watching from afar for years! Join us at Bob’s two sessions and learn from his years of knowledge!


Guest Post at Pure Storage - VVols and SQL Server are a Game Changer

Heraflux is thrilled to have contributed a blog post on how SQL Servers, VVols, and Pure Storage’s unique implementation of VVols combines to make SQL Server DBA jobs better over at Pure’s blog. Pure’s implementation of VVols is simplifying our world and improving our ability to support our businesses.

Read more on it here!

VVols are a Game Changer, and You Should be Excited


SQL Server on Linux Series: Expanding LVM Drives

Next in our SQL Server on Linux series is one important question. On Windows, if you’re about to run out of space, you get your VM admin / storage admin to expand one or more of your drives, and you go to Disk Management and expand the drive with no downtime. How do we accomplish this same task on Linux?

Read more


SQL Saturday Nashville 2018

Our very own Bob Pusateri is proud to present a session entitled “Minimizing User Impact with Advanced Restore Methods” at this year’s SQL Saturday in Nashville, TN on January 13. SQL Saturday is a free training event for Microsoft Data Platform professionals and those wanting to learn about SQL Server, Business Intelligence and Analytics. This event will be held on Jan 13 2018 at Middle Tennessee State University (MTSU), 1301 East Main Street, Murfreesboro, Nashville, Tennessee, 37132, United States.

Minimizing User Impact with Advanced Restore Methods

Speaker: Bob Pusateri

Duration: 60 minutes

Track: Database Administration

We all know that backups are only half the battle – restores are what really matter when disaster strikes. Standard restores, while effective, may require additional downtime and further affect the business. This session will demonstrate three advanced restore methods you should know: point-in-time restores, piecemeal restores, and page restores, and will discuss when each method is appropriate. Attend this session to learn how to be a better DBA by minimizing downtime and user impact after disaster has struck!

RSVP today and we’ll see you at this event!


Welcoming the Year 2018

On behalf of the Heraflux Technologies team, we want to wish you a very happy new year! Twenty-seventeen has been an incredible and exciting year for us and our customers. We had an absolute blast traveling and speaking at a number of different events in 2017, including SQL Nexus, Pure Accelerate, VMworld (USA and EMEA), P21 Connect, PASS Summit, and too many SQL Saturdays and SQL Server User Groups to mention.

Our journey is just beginning. The world continues to change and evolve, especially with the seismic shift from on-prem computing to cloud-based computing, and as it changes, so do we. We at Heraflux have set our sights on a new chapter of innovation for 2018, so stay tuned for some exciting announcements over the next few months!