Trends and news around the convergence of data, cloud and infrastructure

Importing Data With PowerShell and dbatools

I like to use public datasets for experimentation and presentation demos, especially data that people can easily understand and relate to. For some, keeping them up-to-date was a manual process of downloading files, loading tables, and merging. There are of course many better ways to do this, some of which are more automated than others. I could have simply used PowerShell to call bcp, or even just implemented an insert statement and some loops. Then I found dbatools, which has commands which enable me to do an even better job with far less work – just the way I like it!. Here’s how I now keep my datasets current:

Getting The Data

I’ll be using data from the City of Chicago’s Data Portal. They have a tremendous online resource with lots of public datasets available. One that I really like is their listing of towed vehicles. Any time the city tows or impounds a vehicle, a record gets added here and remains for 90 days. It’s very manageable, with only 10 columns and a few thousand rows. (As an added bonus, you can search for license plates you know and then ask your friends about their experience at the impound lot!)

Chicago’s data portal uses Socrata, which is a very well-documented and easy-to-use tool for exposing data. It has a wonderful API for querying and accessing data, but to keep things simple for this post we’re just going to download a CSV file.

If you’re on the page for a dataset, you can download it by clicking on “Export” on the top right and then selecting “CSV”. To avoid all that, the direct link to download a CSV of this dataset is here. Download it and take a look at what we’ve got using your spreadsheet or text editor of choice (mine is Notepad++).

Loading The Data

We’ve got our data, now let’s load it. I like to load the entire downloaded dataset into a stage table, and then copy new rows I haven’t previously seen into my production table that I query from. Here’s the script to create these tables:

-- CREATE STAGE TABLE

CREATE TABLE [dbo].[TowedVehiclesSTG](

 [TowDate] [date] NOT NULL,

 [Make] [nchar](4) NULL,

 [Style] [nchar](2) NULL,

 [Model] [nchar](4) NULL,

 [Color] [nchar](3) NULL,

 [Plate] [nchar](8) NULL,

 [State] [nchar](2) NULL,

 [TowedToFacility] [nvarchar](75) NULL,

 [FacilityPhone] [nchar](14) NULL,

 [ID] [int] NOT NULL

);

-- CREATE FINAL TABLE

CREATE TABLE [dbo].[TowedVehicles](

 [ID] [int] NOT NULL,

 [TowDate] [date] NOT NULL,

 [Make] [nchar](4) NULL,

 [Style] [nchar](2) NULL,

 [Model] [nchar](4) NULL,

 [Color] [nchar](3) NULL,

 [Plate] [nchar](8) NULL,

 [State] [nchar](2) NULL,

 [TowedToFacility] [nvarchar](75) NULL,

 [FacilityPhone] [nchar](14) NULL,

CONSTRAINT PK_TowedVehicles PRIMARY KEY CLUSTERED (ID)

);

Now for the magic – let’s load some data! The dbatools command that does all the heavy lifting here is called Import-DbaCsvToSql. It loads CSV files into a SQL Server table quickly and easily. As an added bonus, the entire import is within a transaction, so if an error occurs everything gets rolled back. I like to specify my tables and datatypes ahead of time, but if you want to load into a table that doesn’t exist yet, this script will create a table and do its best to guess the appropriate datatype. To use, simply point it at a CSV file and a SQL Server instance, database, and (optionally) a table. It will take care of the rest.

# Load from CSV into staging table

Import-DbaCsvToSql -Csv $downloadFile -SqlInstance InstanceName -Database TowedVehicles -Table TowedVehiclesSTG `

-Truncate -FirstRowColumns

The two parameters on the second line tell the command to truncate the table before loading, and that the first line of the CSV file contains column names.

Now the data has been staged, but since this dataset contains all cars towed over the past 90 days, chances are very good that I already have some of these tows in my production table from a previous download. A simple query to insert all rows from staging into production that aren’t already there will do the trick. This query is run using another dbatools command, Invoke-Sqlcmd2.

# Move new rows from staging into production table

Invoke-Sqlcmd2 -ServerInstance InstanceName -Database TowedVehicles `

-Query "INSERT INTO [dbo].[TowedVehicles]

SELECT

 [ID],

 [TowDate],

 [Make],

 [Style],

 [Model],

 [Color],

 [Plate],

 [State],

 [TowedToFacility],

 [FacilityPhone]

FROM (

 SELECT

 s.*,

 ROW_NUMBER() OVER (PARTITION BY s.ID ORDER BY s.ID) AS n

 FROM [dbo].[TowedVehiclesSTG] s

 LEFT JOIN [dbo].[TowedVehicles] v ON s.ID = v.ID

 WHERE v.ID IS NULL

) a

WHERE a.n = 1"
The ID column uniquely identifies each tow event, and the production table uses it as a primary key, however I have found that occasionally the dataset will contain duplicated rows. The ROW_NUMBER() window function addresses this issue and ensures each ID is attempted to be inserted only once.

Putting it all together

I’ve showed you how simple dbatools makes it to load a CSV file into a table and then run a query to load from staging into production, but the beauty of PowerShell is that it’s easy to do way more than that. I actually scripted this entire process, including downloading the data! You can download the full PowerShell script, along with a T-SQL Script for creating the tables, from my GitHub here.

Happy Data Loading!

This post was cross-posted from Bob’s personal technical blog at bobpusateri.com.


Likes1 Comment6 Minutes

Bob Pusateri presenting at Chicago Suburban SQL Server Users Group

Bob Pusateri from our team is proud to be presenting at the Suburban Chicago SQL Server Users Group on April 17th at 6pm a session called “Locks, Blocks, and Snapshots: Maximizing Database Concurrency.”

Abstract: The ability for multiple processes to query and update a database concurrently has long-been a hallmark of database technology, but this feature can be implemented in many ways. This session will explore the different isolation levels supported by SQL Server and Azure SQL Database, why they exist, how they work, how they differ, and how In-Memory OLTP fits in. Demonstrations will also show how different isolation levels can determine not only the performance, but also the result set returned by a query. Additionally, attendees will learn how to choose the optimal isolation level for a given workload, and see how easy it can be to improve performance by adjusting isolation settings. An understanding of SQL Server’s isolation levels can help relieve bottlenecks that no amount of query tuning or indexing can address – attend this session and gain Senior DBA-level skills on how to maximize your database’s ability to process transactions concurrently.

RSVP for this event today!


Likes0 Comments1 Minutes

Free Webinar – DBAs vs. SysAdmins in Cloud Availability

SIOS and Heraflux are proud to host a joint free webinar entitled “DBAs vs. SysAdmins in Cloud Availability” on Thursday, April 19th, at 1pm Eastern. This webinar is hosted by MSSQL Tips.

Database and system administrators have historically had different perspectives on many topics such as high availability, disaster recovery, and performance tuning. Since one silo generally does not have full visibility into the other silos, the age-old shouting match exists when an availability challenge occurs. “Your systems must have the issue, not mine!” is a constant theme during these situations. However, moving these critical systems to the cloud presents some new challenges. Availability becomes an even more critical topic, as outages can occur more randomly than with on-premises systems, and the two sides must work more closely to achieve system availability that meets their organization’s SLAs. Come learn tips on how to work with your system administrators to achieve a higher level of ability for your critical SQL Servers in the cloud.

Register today for this exciting interactive webinar!


Likes0 Comments1 Minutes

David Klee Awarded VMware vExpert for 2018

Heraflux is very proud to announce that David Klee has been re-awarded the VMware vExpert award for 2018. This award is awarded for those individuals who are engaged with the VMware-oriented technical communities around the world, and only around 1600 individuals were given this award this year. Thanks VMware for the effort put in to this program, and we look forward to continuing our advocacy efforts!


Likes0 Comments1 Minute

New Case Studies Added

Heraflux continues to work hard to service our various customers’ needs, including those that are looking to move into the cloud. We have posted a few new success stories to highlight some of our recent successes.

Check them out!

 


Likes0 Comments1 Minutes