cleaning up the default management pack

SCOM has what I feel is a major bug in that it will allow you to save items (monitors, rules, overrides, etc.) in the default MP. Doing this is bad for a lot of reasons, and not only does SCOM allow you to do this, but it is the default option as well. In my case it turned out that an occasional lack of attention allowed me to do this and then removing MP’s later becomes a huge pain in the rear. Anyway I found this good article on how to clean up the mess.

SCOM: powershell run space failed to start

I have been getting these messages since day 1 and tried various things that didn’t work to resolve.

Below I am pasting an example rule with full text so that if someone is searching they will find it. This is one specific alert, I was having an issue with all non microsoft powershell scripted discoveries. For me this was 99% from the XSNMP SCOM Management Pack. To be clear the MP was not the cause of the problem, only the one that tried to run PS (and not work) the most.

PM

Log Name:
Operations Manager

Source:
Health Service Modules

Event Number:
22400

Level:
1

Logging Computer:

User:
N/A

Description:

Failed to run the PowerShell script due to exception below, this workflow will be unloaded. System.NullReferenceException: Object reference not set to an instance of an object. at System.Environment.GetEnvironmentVariable(String variable, EnvironmentVariableTarget target) at System.Management.Automation.ModuleIntrinsics.GetExpandedEnvironmentVariable(String name, EnvironmentVariableTarget target) at System.Management.Automation.ModuleIntrinsics.SetModulePath() at System.Management.Automation.ExecutionContext.InitializeCommon(AutomationEngine engine, PSHost hostInterface) at System.Management.Automation.AutomationEngine..ctor(PSHost hostInterface, RunspaceConfiguration runspaceConfiguration, InitialSessionState iss) at System.Management.Automation.Runspaces.LocalRunspace.DoOpenHelper() at System.Management.Automation.Runspaces.RunspaceBase.CoreOpen(Boolean syncCall) at Microsoft.EnterpriseManagement.Modules.PowerShell.RunspaceController.RunScript(String scriptName, String scriptBody, Dictionary`2 parameters, PowerShellOutputType outputType, Int32 serializationDepth, IModuleDebug iModuleDebug) at Microsoft.EnterpriseManagement.Modules.PowerShell.PowerShellProbeActionModule.RunScript(RunspaceController runspaceController) Script Name: MemoryPctUtil.ps1 One or more workflows were affected by this. Workflow name: xSNMP.Cisco.Rule.CollectMemoryPoolUtil Instance name: I/O Instance ID: {X} Management group: X

Event Data:

< DataItem type =" System.XmlData " time =" 2010-12-03T19:15:30.1742570-05:00 " sourceHealthServiceId =" X" >

< EventData >

< Data > X </ Data >

< Data > xSNMP.Cisco.Rule.CollectMemoryPoolUtil </ Data >

< Data > I/O </ Data >

< Data > {X} </ Data >

< Data > MemoryPctUtil.ps1 </ Data >

< Data > 300 </ Data >

< Data > System.NullReferenceException: Object reference not set to an instance of an object. at System.Environment.GetEnvironmentVariable(String variable, EnvironmentVariableTarget target) at System.Management.Automation.ModuleIntrinsics.GetExpandedEnvironmentVariable(String name, EnvironmentVariableTarget target) at System.Management.Automation.ModuleIntrinsics.SetModulePath() at System.Management.Automation.ExecutionContext.InitializeCommon(AutomationEngine engine, PSHost hostInterface) at System.Management.Automation.AutomationEngine..ctor(PSHost hostInterface, RunspaceConfiguration runspaceConfiguration, InitialSessionState iss) at System.Management.Automation.Runspaces.LocalRunspace.DoOpenHelper() at System.Management.Automation.Runspaces.RunspaceBase.CoreOpen(Boolean syncCall) at Microsoft.EnterpriseManagement.Modules.PowerShell.RunspaceController.RunScript(String scriptName, String scriptBody, Dictionary`2 parameters, PowerShellOutputType outputType, Int32 serializationDepth, IModuleDebug iModuleDebug) at Microsoft.EnterpriseManagement.Modules.PowerShell.PowerShellProbeActionModule.RunScript(RunspaceController runspaceController) </ Data >

< Data />

</ EventData >

</ DataItem >

 

Eventually the scripts would time out like this:

Time window start
12/7/2010 11:57:38 AM

Time window end
12/7/2010 12:02:37 PM

Time First
12/7/2010 11:57:38 AM

Time Last
12/7/2010 11:57:41 AM

Count
44

Context

Date and Time:
12/7/2010 11:57:41 AM

Log Name:
Operations Manager

Source:
Health Service Modules

Event Number:
22411

Level:
1

Logging Computer:
X

User:
N/A

Description:

The PowerShell script will be dropped because the it has been waiting in the queue for more than 10 minutes. Script Name: DiscoverInterfaceName.ps1 One or more workflows were affected by this. Workflow name: xSNMP.Discovery.InterfaceName Instance name: GigabitEthernet2/21 Instance ID: {X} Management group: X

Event Data:

< DataItem type =" System.XmlData " time =" 2010-12-07T11:57:41.4054873-05:00 " sourceHealthServiceId =" X" >

< EventData >

< Data > X </ Data >

< Data > xSNMP.Discovery.InterfaceName </ Data >

< Data > GigabitEthernet2/21 </ Data >

< Data > {X} </ Data >

< Data > DiscoverInterfaceName.ps1 </ Data >

< Data > 300 </ Data >

< Data > 10 </ Data >

< Data />

</ EventData >

</ DataItem >

 

The issue turned out to be  a permissions issue with the health service since it was trying to run these powershell scripts as local system. This article is the one that finally jogged my memory appropriately. Thank you to the author!

exchange 2010 MP for scom = room for improvement

At first I really liked this MP, it knows a LOT about exchange and there was some serious effort put into making sure it grabs everything. After a while though, there are some things you need to be able to change, but can’t.

Take this alert for disk space, we want to change the % that it alerts on, well guess what you can change with the override?

image

That’s right, the only thing you can do is enable or disable the rule.. that’s it. And while we’re on the subject of disabling a rule, that’s not working for at least this one:

Here’s a few of the instances of this alert..

image

And if you look at the overrides, this rule is clearly disabled… but still alerting.

image

I’m still trying to figure both of these out.

weird scom error

I’m getting this repeatedly in SCOM and not sure why. Can’t seem to find out anything about it.

  • Event data collection process unable to write data to the Data Warehouse. Failed to store data in the Data Warehouse. The operation will be retried.
    Exception ‘InvalidOperationException’: The given value of type Int32 from the data source cannot be converted to type tinyint of the specified target column.

 

image

urlscan issue

I have the following URLscan value:

 

RuleList=DenyUserAgent

 
[DenyUserAgent]
DenyDataSection=AgentStrings
ScanHeaders=User-Agent

[AgentStrings]
Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1
Opera/9.02 (Windows NT 5.1; U; ru)

In the logfiles I am seeing where it is blocking non russian mozillas, like this:

2010-11-05 21:49:23 76.94.140.86 896362 GET /programs/images/t8.jpg Rejected rule+’DenyUserAgent’+triggered User-Agent: mozilla/5.0+(windows;+u;+windows+nt+6.1;+en-us;+rv:1.9.2.3)+gecko/20100401+firefox/3.6.3 mozilla/5.0+(windows

(The logfile truncates after a certain length.) I do not understand why it is blocking this mozilla version with a totally different user agent. ???

Looking for an answer on this one….

2008 R2 DC’s and SCOM

We migrated our domain to 2008 AD from 2003 AD last week and found a couple of issues via SCOM.

  • DNSSec Zone TrustAnchors  -  for this one you’ll get an alert that looks like this: Zone TrustAnchors on DNS Server dns.name is not responding to queries.
  • DNS 2008 Server External Address Resolution Alert – for this you’ll just get a failure of the alert. To be fair the same alert for 2003 has the same problem, but we had fixed that a long time ago, thus the info had been forgotten.

That’s it!

Dawson Forest, AKA Georgia Nuclear Aircraft Laboratory

I’m a bit of a conspiracy theorist at heart, and I love stories about secret facilities and whatnot. Today I was reading the ajc.com and stumbled across an article that mentioned one, which lead me down this path of discovery. To be clear I don’t think there is really any conspiracy here, but it is interesting to know that a secret test facility is close.

This is the first thing I found that started this whole bit of research:

http://www.ajc.com/news/atlanta/former-secret-test-site-616831.html?cxtype=rss_news

Dawson Forest is owned by City of Atlanta and is planned for a water reservoir or a 2nd airport:

http://en.wikipedia.org/wiki/Dawson_Forest

Which used to be the site of the Georgia Nuclear Aircraft Laboratory

http://en.wikipedia.org/wiki/Georgia_Nuclear_Aircraft_Laboratory

This is a link to the history of the testing facility:

http://www.pickensprogress.com/archive/insidedawsonforest.html

Another link to the history, summary basically:

http://northgeorgiamountainramblings.wordpress.com/2010/04/28/when-the-cold-war-came-to-dawsonville/

Youtube video about the facility

http://www.youtube.com/watch?v=Bn6N2iV2_os

Here are some flickr pictures about it (some very good ones):

http://www.flickr.com/photos/robertlz/sets/72157600036376038/detail/

Some more pics in this page:

http://www.abovetopsecret.com/forum/thread230310/pg8

Some dude’s blog post about it:

http://northgeorgiamountainfreak.blogspot.com/2008/05/north-georgias-area-51-in-dawsonville.html

Some more pictures and a map:

http://wiki.worldflicks.org/dawson_forest_-_a.k.a._georgias_area_51.html#coords=%2834.367419,-84.16832%29&z=13

AboveTopSecret.org link with a bunch of stuff:

http://www.abovetopsecret.com/forum/thread230310/pg15

Some videos from a guy who went there:

http://www.youtube.com/user/Ratz667#p/a

Facebook page about it:

http://www.facebook.com/pages/Dawson-Forest-GNAL/154640534555862?v=app_4949752878

Geocache site with some pics:

http://www.geocaching.com/seek/gallery.aspx?guid=40ed95a9-8c89-44da-83d8-130820a25849

Link to a pdf about the radiation measurements:

http://www.gaepd.org/Files_PDF/gaenviron/radiation/radrpt2002_dfw.pdf

Affect on pine beetles:

http://links.jstor.org/pss/2473643

Time Magazine article on the pines:

http://www.time.com/time/magazine/article/0,9171,895712,00.html

Pictures (old ones)

http://www.abovetopsecret.com/forum/thread230310/pg12

More pictures from satellite:

http://virtualglobetrotting.com/map/abandoned-government-lab/view/?service=1

scom web application monitoring part 2 – presenting the data – service levels and the dashboard

This is the 2nd post in a short series on monitoring web applications with SCOM. Part 1 is here.

One of the biggest issues I have with SCOM is the sheer amount of data… it is so easy to grab a parameter here, a value here, and you throw that in with all of the stuff the management packs will give you already and suddenly you have a lot to choose from and picking and presenting that data becomes the difficult thing. Do yourself a favor and don’t show management the SCOM console, it looks more complicated than it is and I don’t think it presents that well except to technical folks.

Creating dashboards is limited, there needs to be some more work here from Microsoft. For example, like I mentioned in my previous post, you cannot save what a performance view is supposed to look like, meaning which (or all) counters are checked. I understand why Microsoft did this for the default performance view per user, but IMO once you create a dashboard view, that becomes impractical and there should be a way to make the selections a part of the view.

The dashboard also has the problem of not looking too great via the web console. It’s limited and looks kinda fugly. As a result we have tried using the actual SCOM client that we installed as a citrix app so that we can display it on the flat screen via the wyse terminal. This has the problem of not being able to default a view without a lot of work, and we keep running into issues where you need the detail pane here but not here, and you need to be able to select your views on the left hand side sometimes, but you don’t want the “action” pane visible, and you end up with something that looks like a hack.

Microsoft seems to have realized this and has since created a “solution accelerator” called the service level dashboard. I’m not going to go into what it takes to install this because there are already a ton of sites out there already that have the info. It isn’t the easiest thing to get installed because it requires a sharepoint installation which it customizes and bastardizes quite a bit, and it also needs access to the operations manager database, data warehouse, pretty much everything involving SCOM. In my case it was easier to put the actual sharepoint install on my SCOM server, which I did, and ended up having to figure out why sharepoint stepped all over my SCOM website. This wasn’t rocket science but it took some effort. If I was doing it over again, I would go ahead and install sharepoint before I installed SCOM, or find a home somewhere else that isn’t on the SCOM RMS.

Once you go through the motions of getting sharepoint and the service level dashboard installed, we can get to work.

I ran out of time today so it looks like this will be a 3 part post.