Capture Web Page / HTML to JPG


I’m a member of an American Legion and as such I’ve been working with them on displaying images on screens for schedules and such.  So for a while I’ve been using various programs to capture an image from a website and save it to a jpg file.  So that got me to thinking there has to be a way to do this in Script.  So this article is about how I did just that.

First thing is I needed to find an easy way to bring in a webpage / html into memory for conversion to a Jpg.  After doing much searching I found this nice handy dandy module called NReco. Now that I have a Dll that I can import I can add this to my PowerShell script by doing an add-type:

Add-Type -Path ".\nreco\NReco.ImageGenerator.dll"

I chose for simplicity to put the dll in a sub folder where my script resides.  Now that I have the dll imported now on to seeing what the DLL can do for me.  According to the article the dll will convert an html to a jpg in one line of code.  So what  I chose to do is take advantage of the Invoke-WebRequest and just point it to www.powershell.org to see if it’d save a page for it.

$html = invoke-webrequest -uri 'https://powershell.org/forums/'
$h2image = new-object NReco.ImageGenerator.HtmlToImageConverter
$imageFormat = [NReco.ImageGenerator.ImageFormat]::Jpeg
$jpeg = $h2image.generateImage($html, $imageformat)
$dataStream = New-Object System.IO.MemoryStream(,$jpeg)
$img = [System.Drawing.Image]::FromStream($dataStream)
$img.save('c:\temp\image.jpg')

So the $h2image this is an object of the dll we pulled in which allows us to convert the webpage to a Jpg. Depending on the size of the page it may take a little while for this function to return.

$h2image = new-object NReco.ImageGenerator.HtmlToImageConverter

The next line of code the image format this tells the Dll what type of file we want to save it to. Through intellisense in the ISE you’ll notice there are 3 types included in this Enumeration.

nreco
For what I needed I chose JPG.

Now that I have the type of file and the type added I can now stream this webpage into memory:

$dataStream = New-Object System.IO.MemoryStream(,$jpeg)

This one took me a while to figure out if it hadn’t been for this article I may have never figured it out: http://piers7.blogspot.com/2010/03/3-powershell-array-gotchas.html

solution for getting the array to be streamed is in this tidbit:

Cup(Of T): 3 PowerShell Array Gotchas

The (somewhat counter-intuitive) solution here is to wrap the array – in an array. This is easily done using the ‘comma’ syntax (a comma before any variable creates a length-1 array containing the variable):

PS > $bytes = 0x1,0x2,0x3,0x4,0x5,0x6
PS > $stream = new-object System.IO.MemoryStream (,$bytes)
PS > $stream.length
6

Now that I have the html in a streamed variable I can now write this to a file using another dot net Class System.drawing.image 

$img = [System.Drawing.Image]::FromStream($dataStream)
$img.save('c:\temp\image.jpg')

And walla my web page is saved as a JPG.

image2

Full script:

Add-Type -Path ".\nreco\NReco.ImageGenerator.dll" 
$html = invoke-webrequest -uri 'https://powershell.org/forums/'
$h2image = new-object NReco.ImageGenerator.HtmlToImageConverter
$imageFormat = [NReco.ImageGenerator.ImageFormat]::Jpeg
$jpeg = $h2image.generateImage($html, $imageformat)
$dataStream = New-Object System.IO.MemoryStream(,$jpeg)
$img = [System.Drawing.Image]::FromStream($dataStream)
$img.save('c:\temp\image.jpg')

PowerShell Posse // Thom Schumacher – PowerShellPosse / DevOps

I hope this helps someone ..

Until then keep scripting

Thom

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s