Say what? Analysing speech at SAS

I rarely get to talk about my work at SAS since mostly it’s experimental research and development, and therefore kept fairly hush hush. Recently, however, I had to the opportunity to write a guest article for data-informed.com about a project I’ve been working on for the past few months.

Having partnered up with a Scottish speech-to-text company, we built a system which could take reams of audio, transcribe them, and then perform some pretty sophisticated text analysis to work out the people and locations mentioned in the audio (and what their connections were), as well as identifying any number of discussion topics.

The article itself is very high level, so you don’t need to be technical to understand it. It’s an introduction to the concept, and some of the applications for the method.

Click the image below to go read the article.

 

Speech to Analytics
http://data-informed.com/text-analytics-could-unlock-strategic-value-hidden-speech/

 

 

How to invoke a SAS macro stored in a catalog

Having not done the Advanced Base SAS certification, this was a nightmare to work out. I’m documenting it here for my own future use, and to help anyone else who found themselves in the same situation as me.

What situation was that?

SAS Social Network Analysis can create networks from input data, and to do so it makes use of a pre-compiled “link macro” which is bundled with SNA. This link macro needs to be invoked from a base SAS program, but to do that, you need to tell SAS where to find it.

Note – There were literally zero Google hits for the exact name of this link macro. In case you’re curious, it’s called % sfs_net_main_link_macros.

Anyway I eventually found the location of these macros, in a catalog file.

Note – Not easy to find, and not documented. If anyone is in the same situation as me, it was in my <SASHome>\SASFoundation\9.3\snamva\macros folder. The macros are compiled into the sasmacr.sasb7cat file.

So I have a catalog file, how do I invoke the macro held in it?

Once you know, it’s very very simple.

  1. Copy the catalog file into your working directory
  2. Add a libname statement pointing to this working directory
  3. Use the SASMSTORE option

In other words your code should have the following statements:

libname mylib "D:\mylocation";
OPTIONS MSTORED SASMSTORE=mylib;

This will make the next invocation of the macro succeed, since SAS now knows it’s in your libname directory.

Removing duplicate rows in base SAS

If you ever need to remove duplicate rows from a SAS dataset, here’s an approach I use quite often.

Get your data.

Let’s assume it’s in the following format:

ID Name
123 John
456 Bob
123 John

Sort your data.

/* Step 1 - Sort data */
proc sort data=my_lib.my_dataset;

   /* Sort by a field which you want to be unique, 
   and which will be the same for duplicate rows */
   by id; 

run;

Which should give you the following:

ID Name
123 John
123 John
456 Bob


Remove Duplicates

Now that the data is in order, we can remove the duplicates, by only ever keeping the first entry which matches our unique ID.
-.

/* Step 2 - Get rid of duplicates */
data my_lib.my_dataset;

   /* Iterate through this dataset row by row */
   set my_lib.my_dataset;
   /* Grouping each row by the field we sorted on */ 
   by id; 
   /* And only keep a row if it’s the first */
   if first.id; 

run;

ID Name
123 John
456 Bob


Tadaa!

What happened there?

This approach has 3 facets:

  1. Grouping
  2. SAS’ special first.variable
  3. SAS’ feature of only appending (or “outputting”) a row to a dataset if there are no non-assignment statements which evaluate to false.


Grouping: So we effectively rearranged our data so that all identical IDs were grouped together. Given that the rows are identical and you only want to keep one of them, we choose to keep the first of each group.

First.variable: During execution, SAS will iterate through each row of my_data_set and adding it to a new dataset (which it will eventually overwrite my_data_set with). During each iteration, if it hits a row with the first use of an ID value (for example, 123), it will set first.id to true. On the next run, because it’s already seen the value 123 before, first.id is set to false. This gives us a handy flag which will only ever be toggled on unique rows.

Funny Statement Stuff: So how do we flag to SAS that when this value is set to true, to keep the row? When evaluating a data row if at any point we make any floating statement (i.e. not assigning a variable, or in an if or do loop) which evaluates to false, SAS will take that as a sign that it shouldn’t output that row i.e. in this case, it shouldn’t keep it.

So in simple terms, we’re saying – if you’ve seen this value before, don’t save it again.

SAS Dashboard – Fixing the “too many dials” issue

There’s an interesting feature of SAS BI Dashboard that caught me out when trying to put together some KPI gauges.

We wanted a set of dials that would show us the counts for several types of public security offences. Naturally we configured indicators using count as the measure, but found that the dashboard was showing an indicator per record in our system:

kpi1

 

With a bit of faffing around, I eventually found that this was due to the columns I had selected from my data. Specifically, the “uniqueID” column. This led to my big discovery:

The number of dials = The number of unique combinations of column values

In other words, because I had selected a column where the value was always unique, I got a dial for every row in my table. If I selected just the “category” column, I got the aggregated view I expected:

kpi2
Perhaps this just shows my SAS Dashboard naivety, but I thought I’d document it anyway.

 

SAS – How to Export/Import packages

It’s also possible to import/export metadata with the Wizard equivalent

My team and I have been developing a solution which involves a degree of SAS reports and related metadata. I set up a scheduled, automated backup of our information maps, reports, etc for posterity, and out of general paranoia. For this I used SAS’s command line export and import capabiltiies, which probably weren’t designed to be used that way, but which turned out to be really useful.

It took a wee bit of trial and error, so I thought I’d document it here.

(Handy reference link)

Note: If you’re puttying into your SAS server, make sure that your putty session has “Enable X11” ticked.

 Export

(Ignore any new lines in the text below – I’ve added those for readability)

/usr/local/SASHome/SASPlatformObjectFramework/9.3/ExportPackage -host “mysasmachine” -port 8561 -user myuser@saspw -password mypassword -package “myPackage.spk” -objects “/Shared Data/mySourceFolder(Folder)” -includeDep -subprop

  •  includeDep means that all objects that the export depends on are also exported
  • You can also specify “-types” with the types of files you wish to export
  • Without specifying “(Folder)” on mySourceFolder, all files will be exported “flat” i.e. without their folder hierarchy

 Import

 /usr/local/SASHome/SASPlatformObjectFramework/9.3/ImportPackage -host “mysasmachine” -port 8561 -user myuser@saspw -password mypassword -package “myPackage.spk” -target “/Shared Data/myTargetFolder(Folder)”  -subprop myPackage.subprop

  •  Without specifying “(Folder)” on myTargetFolder it would create a new folder with the name of the old parent folder in the new parent folder (e.g. /myTargetFolder/mySourceFolder)

Converting custom date formats in SAS Information Map Studio

Background

Let’s assume you have some dates in a custom format:

  date20131007 month102013

SAS Reports need to be able to a) present dates in a human readable format and b) understand dates to allow filtering and other funky stuff.

For that reason we need a way of translating these custom dates into SAS dates.

Step 1 – Get an Information Map with a date field

Use an existing one, or see http://support.sas.com/kb/35/471.html for more information on creating an Information Map.

Step 2 – Edit the Expression of your date field

  1. Open the Properties for your date field.
  2. Then on the Definition tab, click “Edit” in the “Expression Settings” section.

Step 3 – Magic

  1. Change the Type to Date so that SAS can treat it as a date field from now on.
  2. Then change the Expression Text to something like this:
input(substr(<<mytable.mydatefield>>,5,8), yymmdd8.)

wat

What’s happening here is that we’re translating the custom date string into a SAS date using what’s known as an INFORMAT.

SUBSTR (string, 5, 8) – Takes a substring from the given string, starting at character 5, with a length of 8 characters. In other words extracting the string date20131007 month102013

INPUT (string, yymmdd8.) – Takes a string and interprets it using the informat “yymmdd8.”. That informat is provided by default with SAS. What we’re doing here is saying to SAS “Here’s a string, but I want you to start treating it as a date. So that you know which part is the year, and which part is the month etc, use this informat as a guide”. Then SAS can know that dd = 07, mm = 10, and yy = 2013.

We’ve effectively translated a string in custom format into a SAS date.

My mind is blown. Now what?

Interpreting the dates as dates means we can now make it human-friendly in our reports, and also allows us to do some excellent SAS-native date filtering.

Formatting

What we specified earlier was an INFORMAT – in other words an interpretation format. What we can do, now that the date is stored as a SAS date, is specify an OUTPUT format, so that we can represent the date in a variety of ways in our reports.

  1. Go back to the Properties dialog for your date field
  2. On the Classifications tab is a “Formats” section. Change the “Format Type” to “Date/Time” and look at the available formats.

Selecting a format will take your date and represent it as a different string depending on the format you choose. Some examples:

Format Output
DATE 07OCT13
DAY 7
WEEKDAY 1
MONNAME October
DDMMYY 07/10/13
DOWNAME Monday

Filtering

We can also now use SAS to filter dates in a very cool way. For example I can now filter all records which were created in 2013, or all records created after a certain date, or on a certain date, etc.

  1. Create a new Filter and choose your date field as the Data Item.
  2. Then set your Condition to “Year to date” – this will filter all your results to only show ones where the date falls between 1 Jan 2013 and today.
  3. Click OK

A note on filtering

It’s always preferable to apply a filter at the Information Map level, rather than typing in a manual filter when you’re creating your Web Report. A filter on the Information Map will mean the data is filtered at the source, rather than decoding a bunch of information and only filtering once we get to the report level.*

* My understanding is that this is only with certain databases. Some allow optimisation by passing through filtering into the queries they make against the source databases. Still a good practice if you can do it.

How to restart SAS server

Photo by nimCC BY
Photo by nim CC BY

More detailed instructions can be found here, but below are the steps I use.

Note – All instructions assume the SAS is installed on a linux box, in /usr/local/, since that’s where it was installed on my machine.

Stop all JBoss server instances

/usr/local/jboss-5.1.0.GA/bin/SASServer1.sh stop

Stop SAS processes

/usr/local/config/Lev1/sas.servers stop

Start SAS processes

/usr/local/config/Lev1/sas.servers start

(wait a couple of minutes)

Start JBoss server instances

/usr/local/jboss-5.1.0.GA/bin/SASServer1.sh start

(wait about 5, 10 minutes)

Creating SAS Web Reports from an Oracle data source

SAS Visual Analytics
Misleading screenshot of SAS Visual Analytics – but it looks much sexier than Web Report Studio, doesn’t it? 🙂

Here’s a sparse set of instructions for the steps you need to take to configure SAS to access data from an Oracle data store and surface it through SAS Web Report Studio (as an example).

Note that once it’s in a SAS format this can be used in any SAS product, and therefore means you can do all sorts of analytics on it. Hence the VA screenshot.

Caveats – Please note the purpose of this is not to break down every area step by step, but more to highlight the various areas that you’ll need to configure to get it all working together. See this as an orchestrator’s guide – I’ll try to document the areas that are necessary for a full orchestra, but I won’t tell you how to play each instrument! As such, these steps assume a working Oracle database setup, as well as the necessary SAS servers ready and waiting to be used.

I might go into more detail for each particular step in later posts.

Management Console

  1. Open Management Console

SAS User

First off choose a SAS user who will access the Oracle data (SAS Demo User).

  1. Go to User Manager
  2. Right click a SAS user
  3. Go to Properties > Accounts
  4. Click “New”
  5. Create a new Authentication Domain called “MyAuthDomain” and enter your Oracle RDBMS main user’s login details (e.g. myUser)
  6. You’ve now tied a SAS user to an Oracle user for a specific domain

Server

Next we create a server to represent where all the data is coming from (i.e. your MIE machine)

  1. Right click Server Manager
  2. Select “New Server”
  3. Select “Oracle Server”
  4. Enter a name (myserver)
  5. Set the “Associated Machine” to your SAS machine
  6. Set “Path” to the string you would use to connect to it if you were using mysql
  7. In other words, <ORACLE MACHINE>:<PORT>/<SERVICE NAME>
  8. e.g. ‘myoraclemachine.domain:1521/myservicename’
  9. Important – use single quotes around the whole thing
  10. Change Authentication Domain to “MyAuthDomain”

New Library

Now create a Library to represent a certain subset of your data – it’s like a logical grouping, and the library will be a lot more publicly accessible so the name must be good.

  1. Expand “Data Library Manager”
  2. Right Click “Libraries”
  3. Select “New Library”
  4. Select “Oracle Library”
  5. Choose a name (mytbls)
  6. Set the location to /Shared Data/<new folder of your choice> (create a new folder for your tables)
  7. Choose “SASApp”
  8. Set “Libref” to what your publically accessible name should be – it can only be 8 characters!
  9. Set “Database Server” to your server name (myserver)
  10. Set “Connection” to myserver (should be there already)

Register Library Tables

All you’ve done is create an empty library, pointing at your server. You need to actually choose the tables you want to use in this library.

  1. Click “Libraries”
  2. Right click your new Library (mytbls)
  3. Select “Register Tables”
  4. A dialog will appear telling you to log into your SAS machine… enter your SAS user details, NOT your oracle user details (always keep a close eye on which machine it’s asking you to log in to – other times it will ask for oracle credentials)
  5. Choose whatever tables you want to register in here

SAS Information Map Studio

Next we need to set up an Information Map to expose the view of the table(s) we want to use in reports.

  1. Open Information Map Studio
  2. Expand SASApp in the left hand side column to see your server
  3. Expand your server
  4. Choose any table or tables you want to use in your Information Map
  5. Choose any fields from those tables
  6. IMPORTANT – SAS will by default set the width of those fields to 32kb characters. Give these a more sensible limit otherwise this will break Web Report Studio (and potentially other things)
  7. Add any filters you want on those fields
  8. Make any Joins between tables you need
  9. etc
  10. Save your Information Map to Shared Data

SAS Web Report Studio

Now we have our data view exposed, we can build reports off of it.

  1. Open Web Report Studio – http://mysasmachine.domain:8080/SASWebReportStudio/
  2. Open the Report Wizard
  3. Select the Information Map you created as your data source
  4. etc etc
  5. Save the report