DBI Processes

Architectural Considerations

  • Linux bash scripts – No cost!!
  • Implemented in AWS utilizing EC2, MySQL and Redshift
  • Rules based metadata driven processes
    • Minimizes coding requirements
    • Driven by rules defined in MySQL tables, no coding
  • DB Extracts – Allows for Oracle, SQL Server, MySQL and PostgreSQL data integration
  • Redshift DB Loader and File Mover – Service-oriented (based on service bus model)
  • Single process for each service
    • Highly reusable and sustainable
    • Alleviates the need for many point to point jobs
  • Extensible framework can be extended to users for further self-service capabilities
  • Standard ‘System Log’ table for all processes used for:
    • Job monitoring
    • Dependency checks
  • Standard functions library
    • Error Handling
    • Error Logging
    • Messaging integration with SNS and Email
    • FTP services
  • S3 integrated files
    • Load files
    • SQL scripts
    • Logs

These integration scripts are primarly written as bash shell scripts to be executed on an AWS EC2 Amazon Linux instance.  The scripts will interact with your defined AWS MySQL or Aurora instance for obtaining various metadata rules tables, and various Redshift environments as defined in your AWS account. The scripts should be installed on your AWS EC2 instance.  Much of the initial configuration will be with the various .ini parameter files which provide the connection details to AWS MySQL and Redshift.  The .ini files will not change much after their initial configuration.  Much of the changes from this point forward will be in the AWS MySQL metadata tables. Although many of the bash shell scripts are explicitly called by the Scheduler, many of the bash shell scripts are managed by our ‘Services Manager’ to run continuously on the AWS EC2 instance.  This allows the scheduling of jobs, data loads into Redshift, and the File Mover to run as specific events occur.

Extensible

As the above defines the framework for BI related services, I also found that they also provided great value to other functional IT and business teams.  As they require processes to do the same functionality, engaging them with the same framework provides a consistent synergy. But with such, a clear separation of the production processes verses non-production and the level of support for each needed to be defined.  In either case, all of the processes allowed for specific groups to be notified on job success or failure. Both the functional IT and business teams maintained responsibility for their own jobs, only relying on BI for support of the standard processes and execution of the services.  For the functional IT teams, this provided a clear path for them to engage with their business partners for specific development and integration needs as required.

Again, the same framework and codebase that BI executed was leveraged with these teams, so upgrades to the service functionality was always available to everyone.

Governance

Allowing database integration to functional IT and business teams resulted in other data management considerations, especially when everyone is sharing the same physical infrastructure.  For data management segregation, schemas provide a logical separation for data, views, stored procedures and functions. For business owned schemas, governance specific to the schema objects was self-managed by the specific business teams, as it would be challenging to provide governance over all of their work.  This actually allowed the business to move at a pace they desired, avoiding the IT bottleneck. But in turn, the business teams didn’t always write efficient code, and executed problematic queries that impacted the overall environment negatively. The primary resolution was to monitor and manage the environment actively, thus ensuring system stability.  The following summarizes the specific considerations integrated into the framework.

  • Identify long running queries (over 5 minutes), and provide email notifications with query specifics, stats and the command to cancel the query if desired.
  • Cancel queries running over 30 minutes, or that exceed specific thresholds.  Provide an email notification and explanation for cancelling the query.
  • Cancel sessions when the number of queries exceed specific thresholds.  This ensures users are not running excessive single row inserts. This also provided a means to limit non-production processes from running overnight.
  • Provide daily DBMS performance and usage statistics to all teams.

If you are interested in more information on how to leverage the processes and tools we have available for free, please use the contacts page to request more information.

Our solution requres an AWS EC2 instance for the execution of processes, an AWS MySQL RDS for storing the metadata rules, AWS Redshift, and AWS S3 buckets to load and unload data with Redshift. All processing is logged in a standardized log table, which can be used for signaling, reporting and trending.

DBI System Log

Click here for more details

Advertisements