Outline of a networked device discovery/configuration engine.

The purpose of this document is to describe a network discovery/configuration engine which will –

  1. Locate all devices within given network ranges.
  2. Find what services a host support (SMTP, Telnet, etc).
  3. Find out what community names are supported (read community name, as well as write community names).
  4. Find out what key mib elements the device supports (RMON, RMON II, transmission mib, Oracle, etc).
  5. Find out all interfaces supported by the system, along with the interface description, status, interface type, and mac address. (Support for Frame Relay not covered here but can be added in the future).
  6. Maintain history of changes to the discovered element (for example the description of an interface changed so perhaps a card was replaced? Or an interface has been changed to operationally up so perhaps a new line is being brought up that we might want to monitor.
  7. Be able to issue alarms based on configuration changes.

The document includes algorithms for doing searches, description of tables used and its elements, and programming issues.

Background

If you have worked with the major network management products like OpenView or Spectrum from Cabletron, you should be familiar with discovery applications. These applications try to determine your network topology and "help you" draw it in a graphical way. More sophisticated discovery routines like those in the Spectrum product try to determine actual connectivity at the physical layer and pinpoint what hub port an interface is connected to and so on.

These discovery programs usually run in the background and consume low system resources. They have filter to limit the number of packets that they are putting on the network, and ranges of addresses to only check.

Notes, Concepts and Rambles

Below in no particular order are some notes about the whole process –

  1. The engine is supposed to run in the background, pretty much all of the time. It should use as limited resources as possible and has options to ensure it doesn’t go crazy and generate tons of polls.
  2. Some global configurations options supported by the engine include –
  1. Use a seed file. A seed file can be setup that has the names or addresses of key devices on the network (preferable routers) so that they are checked first.
  2. Days and times to exclude from discovery. For example you can say from 8 am to 10 am don’t do discovery Monday to Friday. This will eliminate network traffic on times that are critical for you to have maximum bandwidth available for applications.
  3. Number of concurrent networks to check. The discovery process can be checking multiple networks at the same time.
  4. Delay between poll. These parameters tell the engine how long to wait between systems being discovered.
  5. How often to re-verify the DB. This can be every X hours, days or weeks.
  6. Once a significant event happens, a user define process can be called. For example if a new system is found, execute script XYZ w/ certain parameters.
  1. The engine should use a database to store its data, as there will be concurrent access happening. For example the user can be querying was devices are known to exist on network XYZ while the discovery process is updating some elements on that network.

 

Algorithm

Ok, this describes the main process to find devices –

  1. At start up check to see if another instance is running. If it is, then exit as only one master instance should be running.
  2. Start network pollers. Each network poller handles checking for all devices within its range. Up to x number of network pollers can be running in parallel.
  3. If all network pollers are done and the time from start of process (step 1) to now is not equal to how often to re-verify the DB, go to sleep for difference.
  4. Start again with step 1.

For each poller, here is its process –

  1. Do a while loop to check all devices on the specified network range (take into consideration the network mask).
  2. If we should not do checking now, sleep until appropriate time.
  3. Do a ping of the device.
  4. If device is not up then –
  1. If device on DB of existing systems then update the Last Checked and Times Checked fields.
  2. Log on History file.
  3. Continue w/ step 1)
  1. If device is up –
  1. If devices not on DB –
    1. Create it and log on history file.
    2. Fill in Date found, address, and network.
    3. Fill in Address w/ DNS name if DNS sends back a reply. If not, just leave the IP address.
  1. Check for SNMP (See below).
  2. Check for OIDs (see Below)
  3. Check for Services (see below).
  4. Check for interfaces (see below).
  1. Go to step 1.

Check for SNMP

  1. If Read Comm Name and Write Comm Name are null the :
    1. Check all of the comm names on the Comm Names to check Table until one if found. Also, log on History table.
    2. If none is found, exit.
  1. Re check Read Comm Name and Write Comm Name. If they fail, go to step 1) to recheck all comm names.
  2. Update last Time SNMP Check OK w/ current date time.
  3. Update SNMP Supported w/ Yes.
  4. Get sysOID, SysLocation and SysContact and update appropriate fields.
  5. Return.

Check for OIDs supported

  1. If SNMP Supported <> "Yes" then exit.
  2. For each entry found on the Supported OID for device table, reverify that the OID exist by doing a get. If get is successful, update the Last time OID check OK field w/ the current date time.
  3. If OIDs exist on the OID to check table that we have not checked, check to see if they exist.
  4. If they exist, update the OID name and the Last Time OID Check OK field.
  5. Return

Check for supported service

  1. For each entry found on the Supported Service for device table, reverify that the service exist by connecting to the port. If connection is successful, update the Last time service check OK field w/ the current date time.
  2. If service exist on the supported services table that we have not checked, check to see if they exist.
  3. If they exist, update the Service Name and the Last Time Service Check OK field.
  4. Return

Check for interfaces supported

  1. If SNMP Supported <> "Yes" then exit.
  2. For each entry found on the interfaces supported table, reverify that the interface exist by doing a get.
  3. If get is successful, update the Last time interface check OK field w/ the current date time.
  4. If any of the fields changed (ex. Interf Desc or Interf Speed) then update the field and logged it on the History table.
  5. For any additional interface found for which no entry was on record, add a new interface entry.
  6. Return

 

Issues – Next Phase

  1. This algorithm doesn’t try to understand about systems w/ multiple interfaces (ex. Any router). So each interface is included as a system. Possible solutions are at the end of a full discovery, look for duplicate entries in the interface table and try to eliminate duplicate systems entries. This means too that once a new IP responds, the interfaces table should be checked to see if the address "belongs" belongs to a given system. One last thing here too is that what IP should the system object be given ? The loopback address ? (If so how to find it?).
  2. It would be nice if the system understood the concept Frame Relay links. Perhaps having a new table that holds FR circuits and the entries at both end. Unsure if it can be found automatically through discovery.
  3. Nice if we can find how systems are connected. Ex. What hub port (or switch) does an interface connect to? How does a switch connect to a hub, etc. A full blown connectivity process maybe too difficult to do, but perhaps just having what hubs/switch an interface connects to can be very useful.
  4. How to handle DHCP? Perhaps have address exclusion ranges ? (For example between .100 and .200 those are DHCP addresses that we should not even try to find.
  5. How about trying to discover version of snmp supported? Ex. V1 or v3, etc.
  6. How about having some standard MIB OIDs that apps can check to see if some functions are supported by a system? For example have OIDs for MIB-II, RMON (all levels), RMON-Stats (Just the stats group, etc), etc. Useful for add on apps to use this data.

Tables Used

This section describes the data tables used and the elements on it

Configuration Tables

This table contains the name and addresses of networks we are interested in discovering all devices from.Element

Type

Desc

Network Name

Char

The name we give this network (Ex. "Florida", "Accounting Building 10").

Network Address

IP Address

The network address (Ex. "192.168.1.0" or "192.178.0.0")

Network Mask

IP Address

The mask to for the network (Ex. "255.255.255.0" or "255.255.255.247".

Notes

Char

End user provided notes.

Community Names to Check

This table contains a list of community names that you want the engine to check for every time that a device is polled.

Element

Type

Desc

Comm. Name

Char

Community Name to check (Ex. "public", "secret", etc).

Type

Char ("R", "W" or "RW")

Type of check to perform – R is for read only, W is for Write and RW you know!

Services to check for

List of services to check for.

Element

Type

Desc

Service Name

Char

The name of the service (Ex. HTTP or Telnet)

Port

Number

The port number where the service is at.

OIDs to check

This table will contain a list of all oids that the user is interested in having the engine check for.

Element

Type

Desc

OID Name

Char (Unique)

What English name do we give this (Ex. "RMON" "Cabletron Hub"

OID Value

Char

The actual OID to check for existence (Ex. 1.3.6.1.4.99.3.2.1 )

Notes

Char

End user provided notes.

 

Data Tables

Device Table

The table that holds all of the device information.

Element

Type

Desc

Device ID

Number (Unique)

A unique number that identifies this device.

Device Name

Char

The name of the device. Will be the SNMP Name is available, if not the DNS name of the IP found, if not the actual IP address.

Address

IP Address

The IP address of the device.

Network

IP Address

The IP network the device belongs to. (Ex. "205.143.103.0")

Date Found

Date Time

When was it found for the first time.

Found By

Char

The engine that found it (Multiple engines can be used to discover an environment).

Last Check

Date Time

The last time the device was checked or verified.

Times Checked

Number

The number of times we have tried checking the device.

Last Time ICMP Check OK

Date Time

The last time we were able to verify the device was up through ICMP.

SNMP Supported

Char (Y or N)

Does the device support SNMP ? (It could be a wrong community name, etc).

Last Time SNMP Check OK

Date Time

The last time that we were able to verify that the SNMP was working OK.

SysOID

Char

The System OID on the device (From the system group).

SysContact

Char

The system contact field.

SysLocation

Char

The system location.

Read Comm Name

Char

The read community name.

Write Comm Name

Char

The write community name.

Notes

Char

End user supplied notes.

     

Supported OIDs for device

This table links what OIDs a device supports.

Element

Type

Desc

DeviceLink

Number

A number that links the record to a given device.

OID Name

Char

The OID supported (See OID Table)

Last Time OID Check OK

Date Time

The last date we were able to check that this OID exists on the system.

Supported Services

The services a given device supports

Element

Type

Desc

DeviceLink

Number

A number that links this record to a given device.

Service Name

Char

The name of the service supported (See Services table)

Last Time Serv Check OK

Date Time

The last time that the service was verified as responding.

Interfaces Supported

Element

Type

Desc

DeviceLink

Number

A number that links this record to a given device.

Interf Number

Number

The index number of the interface.

Interf Desc

Char

The ifDescr field

Interf Type

Char

What type of interface is it (similar to ifType values)

Interf Speed

Number

The speed of the interface

IP Address

IP Address

The IP address of the interface (can be blank if this is an unnumbered interface !!!)

Phys Address

Char

The physical address of this interface (Mac or other).

Interf Status

Char

The desired status of the interface (Up/down or Testing)

Last Time If Check OK

Date Time

The last time we were able to verify the interface.

 

History Table

This table has a history of what the discovery system is doing. A system wide flag notes how many record it should keep before purging old ones.

Element

Type

Desc

Entry Number

Number

Integer number that keeps increasing.

Date and Time

Date and Time

Date and time of the entry

System

Char

The system affected. Blank if no system is affected (Ex. Start of a discovery poll)

Level

Number

A number that represents the detail of the information been logged. Some examples are :

1 - Start up and shutdown of pollers and agent.

2 – Errors of any kind (SNMP mangled, etc).

3 – New system found.

Etc.

Text

Char

A text message to represent what the logged info means.