A Wonderfull speech

I just stumbled across this speech. I like it.

http://www.salon.com/books/feature/2009/02/20/haruki_murakami/

been busy…

I have been busy with the new job and lot of other things. I need to start posting again.

Informatica Issue

I got this question as a comment.

Hi,

In informatica one of our mappings are using oracle target table. Daily it runs in production and complete in 10 min. but suddenly its just running and running. we stopped it forcely. what could be the potential cause for its continuous run?

Thanks.

There could be many reasons for this. One that comes to my mind is database statistics. When was the last time you gathered stats? The other reasons could be duplicate keys on the target table or if you have setup your session to collect all kinds of stats and data in the logs (it would just take longer but not a whole lot) or huge lookup source with no calculated cache settings….I could go on with this but it would be useless. I would like you to provide some information. Tell me about the version, the OS, the DB version, and all other architectural details. And tell me a bit about your mapping as well. If you want, go ahead and paste the session log here. I will take a look at it, see if I can find the issue and then remove it from the site.

Kirtan

Oracle VM.

Oracle just announced the release of Oracle Virtualization software, Oracle VM. It sounds all great and fun but it needs two machines, one for installing the server and another for managing it. I don’t have two machines for this. I have two machines but this approach seems like a little too much effort for “virtualization”. How does this make it simple. I am sticking with VMWare for now.

Oracle timestamp in Informatica

To extract values from an oracle table column of timestamp datatype, you need to convert it to a character datatype since the Oracle Timestamp datatype is not currently supported by PowerCenter. To extract the value do the following.

  • Use SQL override and function to_char, provided by oracle, to convert the values of a field to character type before bringing it into PowerCenter mapping. For example,SQL> create table tst1 (tmstmp timestamp);Table created.

    SQL> insert into tst1 values (current_timestamp);

    1 row created.

    SQL> /

    1 row created.

    SQL> /

    1 row created.

    SQL> /

    1 row created.

    SQL> /

    1 row created.

    SQL> commit;

    Commit complete.

    SQL> select * from tst1;

    TMSTMP
    ————————————————-
    10-SEP-07 02.44.14.807930 PM
    10-SEP-07 02.44.16.700373 PM
    10-SEP-07 02.44.17.357597 PM
    10-SEP-07 02.44.18.163876 PM
    10-SEP-07 02.44.18.685926 PM

    SQL> select tmstmp, to_char(tmstmp,’YYYY-MM-DD HH:MM:SS.FF6′) as CHAR_TMSTMP from tst1;

    TMSTMP CHAR_TMSTMP
    —————————- ————————–
    10-SEP-07 02.44.14.807930 PM 2007-09-10 02:09:14.807930
    10-SEP-07 02.44.16.700373 PM 2007-09-10 02:09:16.700373
    10-SEP-07 02.44.17.357597 PM 2007-09-10 02:09:17.357597
    10-SEP-07 02.44.18.163876 PM 2007-09-10 02:09:18.163876
    10-SEP-07 02.44.18.685926 PM 2007-09-10 02:09:18.685926

    The resultant column CHAR_TMSTMP in the output of the query is now ready to be extracted in a mapping.

  • IMPORTANT thing to remember here is to make sure that the column for the timestamp value in the mapping, is defined as String(26).
  • Hope this helps…
    Kirtan

    nvarchar2 & varchar2

    Informatica imports all VARCHAR2 fields as nVARCHAR2. This is a big deal. Especially when you are using update strategy to perform update-else-insert operations on target columns that are of VARCHAR2 datatype. One could expect noticable performance gain after converting all target column definitions for VARCHAR2 columns in Informatica.

    Kirtan

    Cache

    Informatica uses cache files for various transformation wherever it needs to store data throughout the life of the mapping. For all these transformations Informatica uses Index Cache and Data Cache. This is done using either memory alone or using memory and physical disks. When the cache settings are calculated correctly and enough room is provided, Informatica PowerCenter stores data in memory. In cases where not enough size is provided for index and a data cache, Informatica stores as much as it can in the memory and then starts creating cache files on the physical disks. For example, if your cache file was supposed to have 500,000 lines but you only provided room for 100,000 lines through your cache settings, Informatica will write rest (400,000) of the lines into cache files on physical disks. I would think the size of these files depends on the kind of platform you are on.

    In case of Lookup Transformation, Informatica also sorts the data by condition ports sequence. In cases where lookup cache is needed to be written to the physical disk, this sorting operation itself can become time consuming. But this could be advantageous if you can select the rows to be processed from the source qualifier in the same sorted order as the lookup condition ports sequence. So, if cache data for a particular key is in memory, then most probably the data for the adjacent keys will also be in memory. In this case, the time that lookup transformation would take to lookup data would keep increasing for each row as it would have to keep digging deeper in the file for each new row. Eventhough it sounds bad, it’s not it’s better than having data in random order. With appropriate indexes on the source tables, this operation could become very fast.

    Wherever possible, use sorted data for these transformations. It is advisable to count appropriate amount of cache settings for each transformation that uses caching. Also one must consult the DBAs to come up with a good indexing strategy for all source tables.

    Transformations that use caching for their process are…

    Lookup Tranformation
    Joiner Tranformation
    Aggregator Tranformation
    Rank Tranformation
    Sorter Tranformation

    Informatica – Seq. Gen. performance issue

    Someone emailed me yesterday with a “never heard of”, at least for me, Informatica performance issue. The message said that they only have a sequence generator that is adding something to the data in the mapping. Everything else is coming straight from the source. The session took less than a minute to execute without a sequence generator and seven minutes with a sequence generator. It sounded a little odd because normally you would think that seq. gen. doesn’t make too much impact on the performance. But after a little investigation and a screenshot of the transformation properties we found out that a setting called “Number of Cached Values” was causing the issue. The value for it was set to 1 instead of recommended 0 (zero) or some large number. Setting this number to a small value could and would cause some issues for you. This setting is useful when there is a need to keep a unique set of values across mappings that share the same sequence generator. Usually these mappings are ran in parallel and the targets are same across all mappings that use this sequence generator.

    This setting tells Informatica about the number of values that are stored in the repository when the session is started. The way Informatica processes this value is the key. Every time during the life of a session when this value is reached, Informatica queries the repository to get the next value for the Seq. Generator. So if you have this value set to 1, Informatica will query the repository database for every row that asks for a value from this transformation. And as you can imagine, this would definitely add some time to your the life of a session run.

    If you set this number to some higher value like 15000, Informatica will query the repository after every 15000 rows. That may not necessarily be a bad thing. It would not have a major impact on total execution time. If the number is set to zero this setting would no more be a factor. Because then Informatica would NOT store the value of in the repository. Instead, it would be kept in the session memory.

    The sequence number is kept on the server if the value is set to zero. In all other cases where it is set to be 1 or greater, the value is stored in the repository and the Informatica server process retrieves it from the database.

    hope this helps…

    Kirtan

    Pivot Query

    So how to pivot data in a table?
    I was asked about this by someone at work today. Let’s say you have a table that looks like following. I have included create and insert statements so that you can see it and understand it yourself.

    Create table t1 (a number, b number);

    insert into t1 values (1,1);
    insert into t1 values (1,2);
    insert into t1 values (1,3);
    insert into t1 values (2,1);
    insert into t1 values (2,2);
    insert into t1 values (2,3);
    insert into t1 values (3,1);
    insert into t1 values (3,2);
    insert into t1 values (3,3);

    select * from t1 order by 1,2;

    ####
    (A hint aside: You must specify the order in which you want the data to be back. It is completely wrong to say that oracle will return the data in order it was inserted.)
    ####

    A B
    ----- ------
    1 1
    1 2
    1 3
    2 1
    2 2
    2 3
    3 1
    3 2
    3 3

    Now, let’s say you want to retrieve all the records from table t1 in a fashion where all the values in column B that belong to the group Column A should be shown side by side. Something like

    A Val1 Val2 Val3
    -- ---- ---- ----
    1 1 2 3

    One way would be to write a procedural code that stores different values of a group in an array or a series of variables and then displays them on screen or returns them to the calling program.

    Or

    You could just write a simple SQL statement that does it for you. You could write a piece of code that looks like this…

    select
    a
    ,max(decode(b,1,b,null)) col1
    ,max(decode(b,2,b,null)) col2
    ,max(decode(b,3,b,null)) col3
    from t1
    group by a;

    A COL1 COL2 COL3
    —– ——- ——- ——-
    1 1 2 3
    2 1 2 3
    3 1 2 3

    3 rows selected

    … and you won’t have to worry about supporting an object such as a function or a procedure.

    What if you had null values in one of the columns.

    update t1 set b=null where a=2 and b=2;

    One row updated

    select
    a
    ,max(decode(b,1,b,null)) col1
    ,max(decode(b,2,b,null)) col2
    ,max(decode(b,3,b,null)) col3
    from t1
    group by a;

    A COL1 COL2 COL3
    —– ——- ——- ——-
    1 1 2 3
    2 1 3
    3 1 2 3

    3 rows selected

    There are many ways to write pivot queries. This is probably the easiest way of doing it.

    Using Informatica to perform pivot operation.

    You can use the Normalizer Transformation to generate rows out of columns. But to pivot values in rows into columns you would have to use Aggregator Transformation. You have to use the *FIRST* or *LAST* function to achieve this. First or last functions will pick first or last of the incoming rows if there are duplicates. Here is an example…

    Suppose you have a source table with this data that is a record of monthly expenses for each of your departments. create table deptexp (sales_id number(5), mon char(3), amt number(10,2));

    insert into deptexp values (1,’JAN’,100);
    insert into deptexp values (1,’FEB’,120);
    insert into deptexp values (1,’MAR’,135);
    insert into deptexp values (2,’JAN’,110);
    insert into deptexp values (2,’FEB’,130);
    insert into deptexp values (2,’MAR’,120);
    Commit;

    You want to load this data after denormalizing it into this structure.
    create table deptexp1 (DEPT_ID number(5), JAN_AMT number(10,2), FEB_AMT number(10,2), MAR_AMT number(10,2))
    The data after our process would look like…
    DEPT_ID JAN_AMT FEB_AMT MAR_AMT
    1 100 120 135
    2 110 130 120

    Do the following to accomplish this.
    1. Create an Aggregator transformation with the following ports and expressions:
    NAME IN/OUT EXPRESSION GROUP BY
    DEPT_ID IN YES
    MONTH IN NO
    AMOUNT IN NO
    JAN_AMT OUT FIRST(AMOUNT, MONTH='JAN')
    FEB_AMT OUT FIRST(AMOUNT, MONTH='FEB')
    MAR_AMT OUT FIRST(AMOUNT, MONTH='MAR')
    APR_AMT OUT FIRST(AMOUNT, MONTH='APR')
    2. Connect the DEPT_ID, MONTH and the AMOUNT ports from the Source Qualifier to the Aggregator.
    3. Connect the JAN_AMT, FEB_AMT, MAR_AMT, etc. ports to the target.
    And that would do it…

    Hope this helps…

    Kirtan

    Move…

    I just moved from Charlotte, NC to Houston, TX for a new job. I am giving consulting/contracting a (indefinite) rest. I don’t know if I will ever be a consultant again but for now I will be a Full-Time employee working on ’our’ in-house projects. Consulting was fun. I can’t complain at all.

    Let’s see how houston treats us!!!