Saturday, September 22, 2007

Reading files in a directory

Fairly typical task, one can find some java based solutions at AskTom website. This post illustrates how to to get the listing using relatively modern extensions of Oracle RDBMS: dbms_scheduler and external tables. I'm using 10.2.0.3.0 running on RH Linux.

The idea is simple: we execute an external OS command and then get its output back via external table interface.

First thing we have to do is to come up with a command, I'm using a simple shell script:

#!/bin/sh
[ -z "$1" -o -z "$2" ] && exit 1
/usr/bin/find "$1" -type f -maxdepth 1 -ctime -1 -printf "%f %CY%Cm%Cd%CH%CM%CS\n" > "$2"


As you can see the script accepts two arguments: first is a directory, second a file that then will be queried as an external table.

Then we create an external table:

CREATE OR REPLACE DIRECTORY TMP AS '/tmp'
/
CREATE TABLE get_dir_context (
  filename    VARCHAR2(1024)
, ctime       DATE
)
ORGANIZATION EXTERNAL
(TYPE oracle_loader
 DEFAULT DIRECTORY tmp
 ACCESS PARAMETERS
 (
  RECORDS DELIMITED BY newline
  FIELDS TERMINATED BY " "
  (
    filename   CHAR(1024)
  , ctime      DATE "YYYYMMDDHH24MISS"
  )
 )
 LOCATION ('listing.txt')
)
REJECT LIMIT UNLIMITED
/


And eventually, execute the external command using dbms_scheduler:

DECLARE
  l_job_name                  VARCHAR2(30) := 'GET_DIR_CONTENT';
  l_command                   VARCHAR2(30) := '/tmp/get_dir_content.sh';
  l_directory                 VARCHAR2(1024) := '/tmp';
  l_outputfile                VARCHAR2(1024) := '/tmp/listing.txt';
BEGIN
  dbms_scheduler.create_job(
    job_name          => l_job_name
  , job_type          => 'EXECUTABLE'
  , job_action        => l_command
  , enabled           => FALSE
  , number_of_arguments => 2
  , auto_drop         => TRUE
  );
  dbms_scheduler.set_job_argument_value(
    job_name          => l_job_name
  , argument_position => 1
  , argument_value    => l_directory
  );
  dbms_scheduler.set_job_argument_value(
    job_name          => l_job_name
  , argument_position => 2
  , argument_value    => l_outputfile
  );
  dbms_scheduler.enable(l_job_name); 
END;
/


One can use these commands to monitor the execution and fetch the content:

SELECT job_name, status, run_duration, actual_start_date, additional_info
  FROM user_scheduler_job_run_details;

COLUMN filename FORMAT A30
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY/MM/DD HH24:MI:SS';
SELECT * FROM get_dir_context;

EXEC dbms_scheduler.purge_log; 


In this example the script is placed into a /tmp directory but make sure you secure the scripts used in production!

No comments: