Wednesday, January 18, 2012

ODBC and IANAAppCodePage

Understanding how to get NLS data into Infosphere DataStage using ODBC Enterprise Stage and ODBC Connector Stage

Infosphere DataStage  parallel canvas has two types of string datatypes , string and ustring. string is the datatype used when char/varchar is selected on the DataStage Designer canvas. ustring is the datatype used when nchar/nvarchar is selected , or when char/varchar with extended option unicode is selected. Both the types carry data differently. For string datatype the data is in a single byte character set . This can be single byte char code page like ISO-8859 or multi byte char code page like SJIS.  The ustring data type carries the data as 2 byte unicode. The default conversion from string->ustring and vice versa is governed by the NLS Map defined at the job level. This could be the project default if setting is not done at the job level.

For ODBC Enterprise stage source stage, the schema for char/varchar columns is determined as string data type. If the char/varchar columns contains NLS data in some charset apart from ISO-8859-1 , the customer may get invalid data due to conversion issues. Since the column is defined as char/varchar in the table, ODBC Enterprise Stage will determine its type as SQL_CHAR and read it as SQL_C_CHAR. When the data is read as char , then IANAAppCodePage will determine the code page in which the application gets the data. This is easy to understand when retrieving data from DB2 z/OS. Since the z/OS is EBCDIC and Infosphere DataStage is an ASCII application, data conversion has to be done from EBCDIC to ASCII. This is accomplished by the ODBC driver using IANAAppCodePage. For EBCDIC Latin characters code page 37, the default IANAAppCodePage=4 is sufficient. For Other EBCDIC NLS character sets, equivalent ASCII character set will have to be determined and IANAAppCodePage updated.

For ODBC Connector, selecting nchar/nvarchar on the GUI will result in the column binding becoming SQL_C_WCHAR . When column binding is SQL_C_WCHAR, the ODBC driver will convert all data into unicode and hence IANAAppCodePage is not used. In case of ODBC Enterprise Stage selecting nchar/nvarchar on the GUI will not result in the column binding becoming SQL_C_WCHAR, as the stage does not get the column definition entered in the GUI.

A related DSN option for NLS data is CharsetFor65535. This is required to be set in the DSN for DB2 in order to read character data stored in column defined as CHAR/VARCHAR CCSID 65535. When the column is defined as CCSID 65535, the DB2 database does not store the data as any particular character set. Instead the binary values submitted by the end user are stored as such. When reading, the data is read as binary unless CharsetFor65535 is set in the DSN. This indicates to the ODBC driver how to interpret the data and hence get it as char/varchar.
e.g if a table on DB2 z/OS has a column with CCSID 65535 and contains data in EBCDIC Latin code page 37, then the DSN has to contain
CharsetFor65535=2028 ( for Progress ODBC driver version 6.0 and above)
IANAAppCodePage=4

This would be required for both ODBC Connector as well as ODBC Enterprise Stage as otherwise the column cannot be read as char/varchar, since the database can return the data only as binary.

Points to note:
  1. ODBC Enterprise Stage in source context determines the column definition of the query/table using SQLDescribeCol. It does not receive the column definition entered on the GUI. A environment variable APT_ODBC_USE_USTRINGS when set can force all char/varchar columns to get data as unicode, but individual columns cannot be controlled. This restriction is removed in ODBC Connector stage.
  2. ODBC Enterprise Stage and ODBC Connector stage do not do any conversion internally. Conversion is performed by ODBC driver using IANAAppCodePage and CharsetFor65535. Hence NLS map entered on the DataStage GUI does not impact the column data. It is used however to convert table , query and column names etc to unicode and vice versa.


Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The methods described here are my own and not necessarily supported by IBM. 
IBM, Infosphere and DataStage are registered trademarks of IBM corporation.


Wednesday, January 4, 2012

Third Party ODBC driver failure on Infosphere DataStage 8.0.1 on Solaris

Any application providing the ODBC interface has the disadvantage of dealing with a myriad of driver libraries. The different ODBC driver libraries may have been compiled with different compiler versions and different compiler options. The ODBC driver library could be compiled with older or newer version of the platform compiler, besides having different compiler options set during compilation which might cause incompatibilities with the driver manager and/or the application that provides the ODBC interface.

One such incompatibility arises for 32 bit Solaris platform, for Infosphere DataStage 8.0.1. The problem is not manifested with 64 bit Solaris libraries and therefore with Infosphere DataStage 8.1 and above , so this article maybe a little late in coming.

ODBC drivers that are not fully ANSI compliant will not work correctly with Infosphere DataStage 8.0.1 which is 32 bit( The problem is not seen with Infopshere DataStage 8.1 and over as libraries are 64 bit and so ANSI compliant). They will mostly cause a sigsegv in iostream during loading or during connection. To be fully ANSI compliant the driver should be compiled with compiler option -compat=5 or equivalent. (refer to CC man page) Also the driver should not have a dependency/reference to any library that is non-ANSI compliant e.g libC.so.5 etc.

The problem is usually manifested by the ODBC Enterprise stage failing with SIGSEGV and dumping core. 

e.g of Teradata provided ODBC driver version 12, 32 bit Solaris

=>[1] unsafe_ostream::flush(0xfef0ea30, 0x0, 0xfc360000, 0x0, 0x0, 0xfef0ea3c), at 0xfeeeb5dc
  [2] ostream::flush(0x1, 0xffbfc768, 0x8, 0x8, 0x0, 0xfef0ea2c), at 0xfeeec744
  [3] flush(0xfef0ea2c, 0x0, 0x15, 0xff1baa54, 0x2328, 0x2000), at 0xfeeebc58
  [4] ostream::operator<<(0xfef0ea2c, 0xfeeebc54, 0xff1baa54, 0xff1ad1a8, 0x0, 0xfeeebc54), at 0x22cec
  [5] APT_ProcessManager::executePlayer(0x15da94, 0xfddec7bc, 0x1b36c8, 0xfee8bdc8, 0xffbfcb38, 0xffbfcb54), at 0xfea62b7c
  [6] APT_ProcessManager::executeSequentialPlayer(0xffbfd050, 0xfee3c5b0, 0xfee8c4e8, 0xff1b53c4, 0xfee8c1b8, 0xfee8eb0c), at 0xfea59530
  [7] APT_ProcessManager::executeSequentially(0xfee8dcfc, 0xffbfd1ac, 0x18123c, 0x3c100, 0x2, 0xffbfd460), at 0xfea57c04

e.g of Sybase IQ provided ODBC driver 32 bit Solaris.

=>[1] __lwp_kill(0x0, 0x6, 0x0, 0x6, 0xfc00, 0x0), at 0xfddc12a4
  [2] raise(0x6, 0x0, 0xfdda4bb8, 0xffffffff, 0xfdde8284, 0x6), at 0xfdd5fe18
  [3] abort(0xffbfbf10, 0x1, 0xfdd46964, 0xa8350, 0xfddeb298, 0x0), at 0xfdd40038
  [4] ut_onsig_sig_handler(0x0, 0x0, 0x0, 0x0, 0xb, 0x0), at 0xfc5e5278
  [5] __sighndlr(0xb, 0xffbfc3c8, 0xffbfc110, 0xfc5e4c80, 0x0, 0x1), at 0xfddc01dc
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [6] ostream::opfx(0xfef0ea88, 0xfb06ea8c, 0xfc160000, 0xfef0ea98, 0x5f, 0x74328), at 0xff0ffab8
  [7] operator<< <unsigned short>(0xfef0ea88, 0xffbfc5ec, 0xfef0ea88, 0x5e, 0xbc, 0xffbfbd6c), at 0xff138e94
  [8] APT_StreamMessageStore::store(0xaa8b8, 0xffbfc5ec, 0x0, 0xa, 0xbc, 0x5e), at 0xff0de580

The problem occurs due to the driver not being ANSI compliant like the Teradata ODBC driver or having a dependency on a non-ANSI compliant library like libC.so.5 e.g the Sybase IQ ODBC driver.

To make the driver ANSI compliant, it has to be compiled with -compat=5.

To check whether the driver is ANSI compliant

$ nm mylibrary.so
Look at the mangled names in the output.

If the mangled names start with __0 (double-underbar zero), the code was compiled in -compat=4 mode. Example: __0FDfooi
If the mangled names start with __1 (double-underbar one), the code was compiled in standard (-compat=5) mode. Example: __1cDfoo6Fi_i_

This test was easily done for the Teradata ODBC driver since it was not ANSI-compliant itself, but
sometimes it maybe harder to determine whether there is a compatibility issue e.g the Sybase IQ ODBC driver which itself had correct mangled names, but had a dependency on libC.so.5 which is not ANSI-compliant.


Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The methods described here are my own and not necessarily supported by IBM. 
IBM, Infosphere and DataStage are registered trademarks of IBM corporation.

Monday, December 19, 2011

Enabling connectivity from Infosphere DataStage to Lotus Notes database using ODBC

IBM Lotus Notes is an integrated desktop client for accessing email and other applications on an IBM Lotus Domino server. Important information related to businesses is managed in Lotus Notes databases these days, hence many customers feel the need to process this data for business analytics. Infosphere DataStage supports connectivity to Lotus Notes Databases or Domino databases through the ODBC interface.

Lotus Notes ODBC driver is only available for windows platform, hence Infosphere DataStage installed on windows platform only can be used to connect to Lotus Notes database using ODBC interface.  If your Infosphere DataStage is on a Unix machine, you can use web services hosted on the Domino server to get data into DataStage. There is a step by step guide on the same available on IBM developerworks.
[Tools like Attunity/Iway could also possibly be used to to establish connectivity. Exploring them in later posts]

The procedure assumes Infosphere DataStage version 8.5 and Lotus Notes ODBC driver version 8.5 and is  described for ODBC Enterprise Stage. Other stages like ODBC Connector can also be used, but may have some problems. This procedure assumes that the user configuring has Administrative privileges.

Configure Lotus Notes ODBC Driver on DataStage server


  • Download and install Lotus Notes client which is a pre-requisite for the Lotus Notes ODBC driver
  • Download and install Lotus Notes ODBC driver, also known as Lotus NotesSQL driver.
  • Configure Lotus NoteSQL Authentication List Manager with the id that you want to use.

  • Also connect to the database you want to use, using Lotus Notes client , to confirm connectivity. This step seems unnecessary, but it seems that some files etc are created/updated when the Lotus client first connects to a database.

Configure DataStage server to interact with Desktop

  • Open DataStage dsrpc service in AdministrativeTools->Services and set Allow service to interact with desktop.


  • Also define environment variable DSE_ODBC_NOTESQL_BEHAVIOR and set it to 1.
  • Restart DataStage
  • Create a System DSN for the Lotus Notes ODBC driver. 
  • Create a ODBC Enterprise job with the DSN created. Set user and password to the lotus notes id and password configured in the Lotus NotesSQL Authentication List Manager.


    Trouble Shooting Tips

    • If the job fails with error "Driver's SQLAllocHandle on SQL_HANDLE_DBC failed."
      Locate notes.ini file on your system and give 'Everyone' user permission to read it.

      • For other errors , refer to Notes SQL Help at Lotus Applications->NotesSQL->NotesSQL Help
      • This procedure is in no way complete. Although it works for most cases, you may still get different errors related to security and permission depending on windows security and user privileges etc. For such errors please refer to microsoft help for resolution.

      Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The methods described here are my own and not necessarily supported by IBM. 
      IBM, Infosphere and DataStage are registered trademarks of IBM corporation.


      Monday, December 12, 2011

      Enabling connectivity to Ingres from IBM Infosphere DataStage

      Ingres is an open source RDBMS provided by Actian Corporation. Infosphere DataStage connectivity to Ingres database is provided through the ODBC interface. InfoSphere DataStage has comprehensive support for ODBC through the ODBC Stage on server canvas as well as ODBC Enterprise Stage and ODBC Connector Stage on parallel canvas.

      Following are the steps required to connect to Ingres from Infosphere DataStage. The example is for AIX platform using ODBC Enterprise Stage.

      1. Install Ingres
      2. In your Infosphere DataStage job, set the following environment variables II_SYSTEM, PATH and LIBPATH to include Ingres.
             II_SYSTEM=/opt/Ingres/IngresII
         PATH=$II_SYSTEM/ingres/bin:$II_SYSTEM/ingres/utility:$PATH
         LIBPATH=/opt/Ingres/IngresII/ingres/lib:/opt/Ingres/IngresII/ingres/lib/lp64:$LIBPATH

      3. Link libiiodbcdriver.1.a to  libiiodbcdriver.1.so
              ln -s  libiiodbcdriver.1.a libiiodbcdriver.1.so
      [Note: Infosphere DataStage uses DataDirect now Progress Software provided driver manager which looks for *.so libraries, hence the need to link]

      4. Edit the .odbc.ini to add a DSN entry as follows ( Remember to set DriverUnicodeType to 1 ):
      [TEST]
      Driver=/opt/Ingres/IngresII/ingres/lib/libiiodbcdriver.1.so
      Description=
      Vendor=Ingres Corporation
      DriverType=Ingres
      DriverUnicodeType=1
      Server=<hostname>
      Database=<database name>
      ServerType=Ingres
      PromptUIDPWD=N
      WithOption=
      RoleName=
      RolePWD=
      DisableCatUnderscore=N
      AllowProcedureUpdate=N
      UseSysTables=N
      BlankDate=N
      Date1582=N
      CatConnect=N
      Numeric_overflow=N
      SupportIIDECIMAL=N
      CatSchemaNULL=N
      ReadOnly=N
      SelectLoops=N
      ConvertThreePartNames=N
      Group=


      4. Create a  parallel job with ODBC Enterprise Stage. Enter the DSN , user and password. The user and password to use is the Ingres Administrator who owns the instance. This is set during installation. Usually defaults to ingres.

      Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The methods described here are my own and not necessarily supported by IBM. 
      IBM, Infosphere and DataStage are registered trademarks of IBM corporation.

      Monday, December 5, 2011

      Bulk loading data to remote Sybase IQ 15.X Server using IBM Infosphere DataStage

      Prior to Sybase IQ version 15.X the bulk loading of data was restricted to local systems because of  the mechanism of the LOAD TABLE command.

      With the release of Sybase IQ 15, a new feature client side load has been introduced which allows LOAD TABLE command to pull data from remote data files.

      Support for this feature has been built into the Sybase Enterprise Stage and Sybase IQ 12 Bulk Load stage to provide users of Infosphere DataStage the capability to load data from DataStage to remote Sybase IQ servers. While Sybase IQ 12 Bulk Load Stage requires minimal configuration to setup this , Sybase Enterprise Stage requires quite a few steps to enable bulk loading to remote Sybase IQ server. The advantage of Sybase Enterprise stage however, is that it uses FIFOs as data file, hence it is much faster than the Sybase IQ 12 Bulk Load stage and hence worth the configuration required.

      Enabling bulk loading to remote Sybase IQ Server 15.X using Sybase Enterprise Stage


      Pre-requisites to using Sybase Enterprise Stage to load data into table on remote Sybase IQ server.

      1. Check that the Sybase IQ server version is 15.1 ESD1 or above : The capability to allow load table from remote data files which are FIFOs is only available from Sybase IQ 15.1 ESD1 onwards. As Sybase Enterprise Stage uses data files which are FIFOs the functionality can only be supported when connecting to Sybase IQ 15.1 ESD1

      2. Install Sybase Open Client SDK 15.5 on Infosphere DataStage engine tier and configure the $SYBASE/interfaces file to add the remote Sybase IQ server ipaddress and port number : The Sybase Enterprise Stage is based on Sybase Open Client API's.However the remote client load feature is only enabled for ODBC interface to Sybase IQ. Hence the Sybase Enterprise Stage converts the supplied Sybase IQ server name to equivalent ODBC Connection String. For this it gets the CS_CONNECTED_ADDR option for the connection using ct_con_props.

      3. Install the Sybase IQ network client 15.1 ESD1 : This software installs the Sybase IQ ODBC driver and the dbisql utility, which supports the new client side load feature.

      Enable remote client load feature on Sybase IQ 15.X server

      Once the pre-requistes are met, the following needs to be done to enable the remote client
      load feature on Sybase IQ server.

      1. Enable the allow_read_client_file and allow_write_client_file options of the Sybase IQ 15.X server. Set these options once on every IQ server. Enable the allow_read_client_file server option property using the isql or dbisql utility:

         set option allow_read_client_file=on
         set option allow_write_client_file=on   
         GRANT READCLIENTFILE TO <group | user>

      Configure Sybase Enterprise Stage to perform bulk load to remote Sybase IQ 15.X server

      1.Use the script $DSHOME/../DSComponents/etc/install.sybase to change current Sybase Enterprise library version to 15. By default the Sybase Enterprise library version is 12.

      2. On the DataStage Server, define a environment variable APT_SYBASE_REMOTE_LOAD and set it to When this environment variable is set, the Sybase Enterprise Stage will try to load the given table using the new syntax 'LOAD TABLE <mytable> USING CLIENT FILE'.

      3.On the DataStage Server, set the environment variables to include Sybase IQ
      SYBASE=<Path to SDK 15.5>
      SYBASE_OCS=OCS-15_0
      PATH=$SYBASE/$SYBASE_OCS/bin:<Sybase IQ network client>/IQ-15_2/bin
      LIBPATH=$SYBASE/$SYBASE_OCS/lib:<Sybase IQ network client>/IQ-15_2/lib

      Now the Sybase Enterprise Stage is ready to load data into the remote Sybase IQ server.

      4. Create a Parallel job with Sybase Enterprise Stage as target stage . Set write Method=IQ Write.

      Enabling bulk loading to remote Sybase IQ Server 15.X using Sybase IQ 12 Bulk Load Stage

      The Sybase IQ 12 Bulk Load Stage uses flat files as data files. Hence it is able to perform bulk load to remote Sybase IQ server from version 15.0 onwards.



      1.To perform bulk load to remote Sybase IQ server, create a job with Sybase IQ 12 Bulk Load Stage as target stage and select Load Method=AutoLoad via ODBC. The client side load feature is only available for ODBC interface to Sybase IQ, hence only when Load Method is set to AutoLoad via ODBC, bulk load to remote Sybase IQ server is possible.

      2. Install  Sybase IQ network client 15.X on Infosphere DataStage and configure a ODBC DSN for the remote Sybase IQ 15.X server. The Sybase IQ 12 Bulk Load Stage checks both the ODBC driver version as well as the Sybase IQ server version it connects to before using the new client side load syntax.
      e.g DSN for AIX 64 is given below

      [iqdemo]
      Driver=/home/sybase/IQ15.1/IQ-15_1/lib64/libdbodbc11.so
      Description=SybaseIQ
      EngineName=<hostname>_iqdemo
      DatabaseName=iqdemo
      CommLinks=TCPIP{HOST=<ipaddress>:<port>;}
      UID=
      PWD=
      AutoStop=no
      AutoPreCommit=Y
      DriverUnicodeType=1


      3.The client side load feature must be enabled on the Sybase IQ server.  If the feature is disabled, Sybase IQ 12 Bulk Load will detect it and revert to normal LOAD TABLE command.
      To enable the allow_read_client_file server option property use the isql or dbisql utility:

         set option allow_read_client_file=on
         set option allow_write_client_file=on   
         GRANT READCLIENTFILE TO <group | user>






      Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The methods described here are my own and not necessarily supported by IBM. 
      IBM, Infosphere and DataStage are registered trademarks of IBM corporation.

      Tuesday, November 29, 2011

      Enabling bulk load to remote IQ server 12.X using IBM Infosphere DataStage


      IBM Infosphere DataStage provides rich connectivity to Sybase® IQ database server through Sybase Enterprise stage and Sybase IQ Bulk Load stage. But it can be a problem to bulk load data from Infosphere DataStage to Sybase® IQ version 12.X , specially if the Sybase® IQ server is remote to Infosphere DataStage server.

      Sybase® IQ version 12.X uses LOAD TABLE command for bulk loading of data.  The data file supplied to the LOAD TABLE command should be accessible from the Sybase® IQ 12.X server on which the command runs. When this command is executed remotely, the data file is expected to be present on the Sybase® IQ 12.X server for which the load is issued.
      Note: Sybase® IQ version 15 and later has extended LOAD TABLE capability which does not have this restriction.More on this in later posts

      A popular way of loading to a remote Sybase® IQ server from Infosphere DataStage is to use the Sybase IQ 12 Bulk Loader Stage and a NFS shared partition between the Infosphere DataStage server and Sybase® IQ server. This stage exports the data to a flat file and NFS sharing the flat file between the Infosphere DataStage server and the remote Sybase® IQ server makes the data file locally available to the remote IQ server. Although this method works it has some drawbacks in terms of  performance as the exporting of data to a flat file and then loading the table is sequential and done one after the another. Also, the reading of NFS shared file is subject to performance and network issues.

      Sybase Enterprise stage exports data and bulk loads the table parallelly . However it uses a FIFO as the data file. As the data file is a FIFO , it is not possible to share it using NFS.
      It is possible to bulk load to a remote Sybase® IQ server using Sybase Enterprise stage if the remote Sybase® IQ server is installed on the same platform as the Infosphere DataStage server and hence can be made part of a node pool on the Infosphere DataStage server. 
      When this node pool is added as a constraint on the Sybase Enterprise stage, the Sybase Enterprise stage runs on the remote Sybase® IQ server and hence the datafile gets created such that it is accessible to the Sybase® IQ server. 

      Following is a step by step method to enable the Sybase Enterprise Stage to bulk load to remote Sybase® IQ server. The method assumes prior knowledge of Infosphere DataStage configuration and some Sybase IQ administration.


      • Install Infosphere DataStage components on remote Sybase® IQ server and configure rsh etc as specified in InfoSphere DataStage documentation to make a grid environment for Infosphere DataStage.
      • Install Sybase SDK on Infosphere DataStage server and configure the Sybase interfaces file to access remote Sybase® IQ server. [Note]: The Sybase SDK path on Infosphere DataStage server should be the same as that on the remote Sybase® IQ server. e.g if remote Sybase® IQ server has sybase installed in /home/sybase/iq, then on Infosphere DataStage server also the Sybase SDK should be installed in the same path.
      • Create a new configuration file(APT_CONFIG_FILE ) on Infosphere DataStage and add the remote Sybase® IQ server as one of the nodes. Add that node to a node pool called "sybase", e.g
      { 
      node "DataStage_Server"
        {
          fastname "DataStage_Server"
          pool "" "b"
          resource scratchdisk "/home/kartgarg/sand751/scratch" { }
          resource disk "/home/kartgarg/sand751/pds_files/node2" 
          { pool "" "export" "node2" "node2a" "nodeb" "sasdataset" }
          resource disk "/home/kartgarg/sand751/pds_files/node2" 
          { pool "" "export" "node2" "node2b" "nodeb" "sasdataset" }
          resource sas "" { }
        }
        node "DataStage_Server_node2"
        {
          fastname "Remote_IQ_Server"
          pool "" "b" "sybase"
          resource scratchdisk "/home/kartgarg/sand751/scratch" { }
          resource disk "/home/kartgarg/sand751/pds_files/node3" 
          { pool "" "export" "node3" "node3a" "nodeb" "sasdataset" }
          resource disk "/home/kartgarg/sand751/pds_files/node3" 
          { pool "" "export" "node3" "node3b" "nodeb" "sasdataset" }
          resource sas "" { }
        }
      }
      
      • On the job , open Sybase Enterprise stage , set Write Method to IQ write and then select the Stage tab. Under node pool and resource contraint , select node pool "sybase". 
      •  Compile and run the job.

        Points to note

        •  The method is only applicable if Sybase® IQ server and Infosphere DataStage server are installed on the same platform.
        • The Sybase® IQ server should reside in its own node pool and that pool should have one and only one server, i.e the IQ server.
        • The Sybase® IQ server should be accessible from Information Server through ct-library. Hence Sybase® SDK should be installed on the Information Server. The interfaces file should have an entry for the remote Sybase® IQ server.
        • Using a node map constraint instead of a node pool will not work as node map constraint are not handled by the Sybase Enterprise stage. 

        Disclaimer: The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. The methods described here are my own and not necessarily supported by IBM. 
        IBM, Infosphere and DataStage are registered trademarks of IBM corporation.