Friday, June 9, 2023
HomeYour Dream CoffeeFun with PowerShell: Deduplicating Records

Fun with PowerShell: Deduplicating Records

In the previous post, we got a list of Avengers movies from the Open Movie Database and printed it onto the screen.

This is great, but the real power of PowerShell comes from the ability to manipulate these results. Let's extract just the title from each element of the list.

Pretty neat, but I spot a couple of duplicate. Let's take a look at the raw data again.

The select command (a shorthand for Select-Object) allows us to pare down the data we're getting so we can take a closer look. It looks like we have two exact copies of Avengers Assemble (with the same imdbID) but three different movies called The Avengers.

What we want to do is to dedupe the list of movies based on their imdbID. The sort command (shorthand for Sort-Object) has exactly what we need. We can sort a list of objects by some property in the object, and if we pass the -uniq argument to sort, it will eliminate all but the first copy of the object.

Pretty cool so far, but what if we want to convert the result to JSON?

No problem! The ConvertTo-Json command will convert the objects to JSON for us.

You might be asking: how are you supposed to find the ConvertTo-Json command? The cool thing about PowerShell is that commands are named, conventionally, as -. As a result, PowerShell comes with a command Get-Command that lets you query all of the commands for the ones you're looking for.

One pretty cool thing about Get-Command: because it returns an array of objects like any other PowerShell command, we can use what we already learned to tone down the noise of this list.

We already covered the fact that you can get help about any command by using -? or Get-Help. But on top of printing out the help in your console, you can open up the help in your default web browser.

Opens up your default web browser at the URL of the documentation for the help.

Now that we've seen Get-Command -Noun and Get-Command -Verb, we can understand why PowerShell's commands can seem so verbose, but also why we wouldn't want the PowerShell designers to just stick with the shorthand names.

Let's take a look at all of the commands that work with objects that have aliases:

First, I asked PowerShell for a list of all of the aliases in the system. Next, I mapped the aliases over the ResolvedCommand property. Next, I restricted the results to the resolved commands whose noun was Object, and extracted the name property from each of the matching resolved commands.

How did I figure that out?

Looks good! We can now use where and foreach to finish the job.

Because I was lazy, I also used PowerShell to create the bulleted list to paste into my blog software.

In general, the aliases are much prettier:

When working interactively in the shell, the short names are really great. But the longer canonical names follow the - convention which makes them more discoverable. Generally speaking, long-time PowerShell authors also find the longer names to be more readable when writing scripts that will need to be maintained.

Bottom line: both the longhand and the shorthand have their place, and you will probably find yourself using the shorthand versions almost exclusively when working interactively in the command-line.

One last thing: even though Get-Command gives us a simplified table of the commands that matched, there's way more information inside.

Armed with that information, we can give ourselves a better table of information about the JSON facilities that come with PowerShell.

By now you should be getting the picture: since PowerShell works with collections of objects rather than text files, you can use your basic knowledge on any kind of collection you come across.

As your fundamental skills improve, you'll be able to manipulate not only JSON documents, but also processes, files, and even functions and aliases in the same way!


Most Popular

Recent Comments