Informatica uses cache files for various transformation wherever it needs to store data throughout the life of the mapping. For all these transformations Informatica uses Index Cache and Data Cache. This is done using either memory alone or using memory and physical disks. When the cache settings are calculated correctly and enough room is provided, Informatica PowerCenter stores data in memory. In cases where not enough size is provided for index and a data cache, Informatica stores as much as it can in the memory and then starts creating cache files on the physical disks. For example, if your cache file was supposed to have 500,000 lines but you only provided room for 100,000 lines through your cache settings, Informatica will write rest (400,000) of the lines into cache files on physical disks. I would think the size of these files depends on the kind of platform you are on.
In case of Lookup Transformation, Informatica also sorts the data by condition ports sequence. In cases where lookup cache is needed to be written to the physical disk, this sorting operation itself can become time consuming. But this could be advantageous if you can select the rows to be processed from the source qualifier in the same sorted order as the lookup condition ports sequence. So, if cache data for a particular key is in memory, then most probably the data for the adjacent keys will also be in memory. In this case, the time that lookup transformation would take to lookup data would keep increasing for each row as it would have to keep digging deeper in the file for each new row. Eventhough it sounds bad, it’s not it’s better than having data in random order. With appropriate indexes on the source tables, this operation could become very fast.
Wherever possible, use sorted data for these transformations. It is advisable to count appropriate amount of cache settings for each transformation that uses caching. Also one must consult the DBAs to come up with a good indexing strategy for all source tables.
Transformations that use caching for their process are…
Lookup Tranformation
Joiner Tranformation
Aggregator Tranformation
Rank Tranformation
Sorter Tranformation
Hi, Thanks for sharing the information with all of us. I was trying to understand why the sorted input would be faster for a lookup transformation. I do not recollect any option in lookup transformation where we can tell the lookup that input data is sorted. If that’s true then how would sorted input help the lookup operation. I assume for every incoming row it would start the search from the top.
i appreciate if you could reply. Thanks in advance.
can you tell me why it would not help? Would you rather have random data coming in through the source .. and scan a sorted lookup cache file? (in some cases this might actually work faster) but but with large data , and lookup , i would rather let first few (million?) rows pass by faster and then let it scan … using the index cache lookup transformation would know where to go , no?
tell me why it would not help (or help).