Feed aggregator

Converting columns from one data type to another in PostgreSQL

Yann Neuhaus - Mon, 2019-07-08 00:19

Usually you should use the data type that best fits the representation of your data in a relational database. But how many times did you see applications that store dates or numbers as text or dates as integers? This is not so uncommon as you might think and fixing that could be quite a challenge as you need to cast from one data type to another when you want to change the data type used for a specific column. Depending on the current format of the data it might be easy to fix or it might become more complicated. PostgreSQL has a quite clever way of doing that.

Frequent readers of our blog might know that already: We start with a simple, reproducible test setup:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

What do we have here? A simple table with two columns: Column “a” is an integer and column “b” is of type text. For humans it seems obvious that the second column in reality contains a date but stored as text. What options do we have to fix that? We could try something like this:

postgres=# alter table t1 add column c date default (to_date('YYYYDDMM',b));
psql: ERROR:  cannot use column reference in DEFAULT expression

That obviously does not work. Another option would be to add another column with the correct data type, populate that column and then drop the original one:

postgres=# alter table t1 add column c date;
ALTER TABLE
postgres=# update t1 set c = to_date('YYYYMMDD',b);
UPDATE 3
postgres=# alter table t1 drop column b;
ALTER TABLE

But what is the downside of that? This will probably break the application as the column name changed and there is no way to avoid that. Is there a better way of doing that? Let’s start from scratch:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

The same setup as before. What other options do we have to convert "b" to a real date without changing the name of the column. Let's try the most obvious way and let PostgreSQL decide what to do:

postgres=# alter table t1 alter column b type date;
psql: ERROR:  column "b" cannot be cast automatically to type date
HINT:  You might need to specify "USING b::date".

This does not work as PostgreSQL in this case can not know how to go from one data type to another. But the “HINT” does already tell us what we might need to do:

postgres=# alter table t1 alter column b type date using (b::date);
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# 

For our data in the “b” column that does work. but consider you have data like this:

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'01-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (2,'02-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (3,'03-JAN-2019');
INSERT 0 1
postgres=# select * from t1;
 a |      b      
---+-------------
 1 | 01-JAN-2019
 2 | 02-JAN-2019
 3 | 03-JAN-2019
(3 rows)

Would that still work?

postgres=# alter table t1 alter column b type date using (b::date);;
ALTER TABLE
postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-02
 3 | 2019-01-03
(3 rows)

Yes, but in this case it will not:

DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'First--January--19');
INSERT 0 1
postgres=# insert into t1 values (2,'Second--January--19');
INSERT 0 1
postgres=# insert into t1 values (3,'Third--January--19');
INSERT 0 1
postgres=# select * from t1;
 a |          b           
---+---------------------
 1 | First--January--19
 2 | Second--January--19
 3 | Third--January--19
(3 rows)

postgres=# alter table t1 alter column b type date using (b::date);;
psql: ERROR:  invalid input syntax for type date: "First--January--19"
postgres=# 

As PostgreSQL has no idea how to do the conversion this will fail, no surprise here. But still you have the power of doing that by providing a function that does the conversion in exactly the way you want to have it:

create or replace function f_convert_to_date ( pv_text in text ) returns date
as $$
declare
begin
  return date('20190101');
end;
$$ language plpgsql;

Of course you would add logic to parse the input string so that the function will return the matching date and not a constant as in this example. For demonstration purposes we will go with this fake function:

postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-01
 3 | 2019-01-01
(3 rows)

… and here we go. The column was converted from text to date and we provided the exact way of doing that by calling a function that contains the logic to do that. As long as the output of the function conforms to the data type you want and you did not do any mistakes you can potentially go from any source data type to any target data type.

There is one remaining question: Will that block other sessions selecting from the table while the conversion is ongoing?

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 select a, '20190101' from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create index i1 on t1(a);
CREATE INDEX

In one session we will do the conversion and in the other session we will do a simple select that goes over the index:

-- first session
postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));

Second one at the same time:

-- second session
postgres=# select * from t1 where a = 1;
-- blocks

Yes, that will block, so you should plan such actions carefully when you have a busy system. But this is still better than adding a new column.

Cet article Converting columns from one data type to another in PostgreSQL est apparu en premier sur Blog dbi services.

Telling the PostgreSQL optimizer more about your functions

Yann Neuhaus - Sun, 2019-07-07 05:29

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

As usual let’s start with a little test setup:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select a,a::text,now() from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create unique index i1 on t1(a);
CREATE INDEX
postgres=# analyze t1;
ANALYZE

A simple table containing 1’000’000 rows and one unique index. In addition let’s create a simple function that will return exactly one row from that table:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql;

What is the optimizer doing when you call that function?

postgres=# explain (analyze) select f_tmp (1);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.654..0.657 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 0.696 ms
(4 rows)

We know that only one row will be returned but the optimizer is assuming that 1000 rows will be returned. This is the default and documented. So, no matter how many rows will really be returned, PostgreSQL will always estimate 1000. But you have some control and can tell the optimizer that the function will return one row only:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1;

Looking again at the execution plan again:

postgres=# explain (analyze) select f_tmp (1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..0.27 rows=1 width=32) (actual time=0.451..0.454 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.068 ms
 Execution Time: 0.503 ms
(4 rows)

Instead of 1000 rows we now do see that only 1 row was estimated which is what we specified when we created the function. Of course this is a very simple example and in reality you often might not be able to tell exactly how many rows will be returned from a function. But at least you can provide a better estimate as the default of 1000. In addition you can also specify a cost for your function (based on cpu_operator_cost):

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1
   cost 1;

If you use functions remember that you can give the optimizer more information and that there is a default of 1000.

Cet article Telling the PostgreSQL optimizer more about your functions est apparu en premier sur Blog dbi services.

SQL Server containers and docker network driver performance considerations

Yann Neuhaus - Fri, 2019-07-05 01:45

Few months ago I attended to the Franck Pachot session about Microservices and databases at SOUG Romandie in Lausanne on 2019 May 21th. He covered some performance challenges that can be introduced by Microservices architecture design and especially when database components come into the game with chatty applications. One year ago, I was in a situation where a customer installed some SQL Server Linux 2017 containers in a Docker infrastructure with user applications located outside of this infrastructure. It is likely an uncommon way to start with containers but anyway when you are immerging in a Docker world you just notice there is a lot of network drivers and considerations you may be aware of and just for a sake of curiosity, I proposed to my customer to perform some network benchmark tests to get a clear picture of these network drivers and their related overhead in order to design correctly Docker infrastructure from a performance standpoint.

The initial customer’s scenario included a standalone Docker infrastructure and we considered different approaches about application network configurations from a performance perspective. We did the same for the second scenario that concerned a Docker Swarm infrastructure we installed in a second step.

The Initial reference – Host network and Docker host network

The first point was to get an initial reference with no network management overhead directly from the network host. We used the iperf3 tool for the tests. This is a kind of tool I’m using with virtual environments as well to ensure network throughput is what we really expect and sometimes I got some surprises on this topic. So, let’s go back to the container world and each test was performed from a Linux host outside to the concerned Docker infrastructure according to the customer scenario.

The attached network card speed link of the Docker Host is supposed to be 10GBits/sec …

$ sudo ethtool eth0 | grep "Speed"
        Speed: 10000Mb/s

 

… and it is confirmed by the first iperf3 output below:

Let’s say that we tested the Docker host driver as well and we got similar results.

$ docker run  -it --rm --name=iperf3-server  --net=host networkstatic/iperf3 -s

 

Docker bridge mode

The default modus operandi for a Docker host is to create a virtual ethernet bridge (called docker0), attach each container’s network interface to the bridge, and to use network address translation (NAT) when containers need to make themselves visible to the Docker host and beyond. Unless specified, a docker container will use it by default and this is exactly the network driver used by containers in the context of my customer. In fact, we used user-defined bridge network but I would say it doesn’t matter for the tests we performed here.

$ ip addr show docker0
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:70:0a:e8:7a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:70ff:fe0a:e87a/64 scope link
       valid_lft forever preferred_lft forever

 

The iperf3 docker container I ran for my tests is using the default bridge network as show below. The interface with index 24 corresponds to the veth0bfc2dc peer of the concerned container.

$ docker run  -d --name=iperf3-server -p 5204:5201 networkstatic/iperf3 -s
…
$ docker ps | grep iperf
5c739940e703        networkstatic/iperf3              "iperf3 -s"              38 minutes ago      Up 38 minutes                0.0.0.0:5204->5201/tcp   iperf3-server
$ docker exec -ti 5c7 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

[clustadmin@docker1 ~]$ ethtool -S veth0bfc2dc
NIC statistics:
     peer_ifindex: 24

 

Here the output after running the iperf3 benchmark:

It’s worth noting that the “Bridge” network adds some overheads with an impact of 13% in my tests but in fact, it is an expected outcome to be honest and especially if we refer to the Docker documentation:

Compared to the default bridge mode, the host mode gives significantly better networking performance since it uses the host’s native networking stack whereas the bridge has to go through one level of virtualization through the docker daemon.

 

When the docker-proxy comes into play

Next scenario we wanted to test concerned the closet network proximity we may have between the user applications and the SQL Server containers in the Docker infrastructure. In other words, we assumed the application resides on the same host than the SQL Server container and we got some surprises from the docker-proxy itself.

Before running the iperf3 result, I think we have to answer to the million-dollar question here: what is the docker-proxy? But did you only pay attention to this process on your docker host? Let’s run a pstree command:

$ pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─agetty
        ├─auditd───{auditd}
        ├─containerd─┬─containerd-shim─┬─npm─┬─node───9*[{node}]
        │            │                 │     └─9*[{npm}]
        │            │                 └─12*[{containerd-shim}]
        │            ├─containerd-shim─┬─registry───9*[{registry}]
        │            │                 └─10*[{containerd-shim}]
        │            ├─containerd-shim─┬─iperf3
        │            │                 └─9*[{containerd-shim}]
        │            └─16*[{containerd}]
        ├─crond
        ├─dbus-daemon
        ├─dockerd─┬─docker-proxy───7*[{docker-proxy}]
        │         └─20*[{dockerd}]

 

Well, if I understand correctly the Docker documentation, the purpose of this process is to enable a service consumer to communicate with the service providing container …. but it’s only used in particular circumstances. Just bear in mind that controlling access to a container’s service is massively done through the host netfilter framework, in both NAT and filter tables and the docker-proxy mechanism is required only when this method of control is not available:

  • When the Docker daemon is started with –iptables=false or –ip-forward=false or when the Linux host cannot act as a router with Linux kernel parameter ipv4.ip_forward=0. This is not my case here.
  • When you are using localhost in the connection string of your application that implies to use the loopback interface (127.0.0.0/8) and the Kernel doesn’t allow routing traffic from it. Therefore, it’s not possible to apply netfilter NAT rules and instead, netfilter sends packets through the filter table’s INPUT chain to a local process listening on the docker-proxy
$ sudo iptables -L -n -t nat | grep 127.0.0.0
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

 

In the picture below you will notice I’m using the localhost key word in my connection string so the docker-proxy comes into play.

A huge performance impact for sure which is about 28%. This performance drop may be explained by the fact the docker-proxy process is consuming 100% of my CPUs:

The docker-proxy operates in userland and I may simply disable it with the docker daemon parameter – “userland-proxy”: false – but I would say this is a case we would not encounter in practice because applications will never use localhost in their connection strings. By the way, changing the connection string from localhost to the IP address of the host container gives a very different outcome similar to the Docker bridge network scenario.

 

Using an overlay network

Using a single docker host doesn’t fit well with HA or scalability requirements and in a mission-critical environment I strongly guess no customer will go this way. I recommended to my customer to consider using an orchestrator like Docker Swarm or K8s to anticipate future container workload that was coming from future projects. The customer picked up Docker Swarm for its easier implementation compared to K8s.

 

After implementing a proof of concept for testing purposes (3 nodes included one manager and two worker nodes), we took the opportunity to measure the potential overhead implied by the overlay network which is the common driver used by containers through stacks and services in such situation. Referring to the Docker documentation overlay networks manage communications among the Docker daemons participating in the swarm and used by services deployed on it. Here the docker nodes in the swarm infrastructure:

$ docker node ls
ID                            HOSTNAME                    STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
vvdofx0fjzcj8elueoxoh2irj *   docker1.dbi-services.test   Ready               Active              Leader              18.09.5
njq5x23dw2ubwylkc7n6x63ly     docker2.dbi-services.test   Ready               Active                                  18.09.5
ruxyptq1b8mdpqgf0zha8zqjl     docker3.dbi-services.test   Ready               Active                                  18.09.5

 

An ingress overlay network is created by default when setting up a swarm cluster. User-defined overlay network may be created afterwards and extends to the other nodes only when needed by containers.

$ docker network ls | grep overlay
NETWORK ID    NAME              DRIVER   SCOPE
ehw16ycy980s  ingress           overlay  swarm

 

Here the result of the iperf3 benchmark:

Well, the same result than the previous test with roughly 30% of performance drop. Compared to the initial reference, this is again an expected outcome but I didn’t imagine how important could be the impact in such case.  Overlay network introduces additional overhead by putting together behind the scene a VXLAN tunnel (virtual Layer 2 network on top of an existing Layer 3 infrastructure), VTEP endpoints for encapsulation/de-encapsulation stuff and traffic encryption by default.

Here a summary of the different scenarios and their performance impact:

Scenario Throughput (GB/s) Performance impact Host network 10.3 Docker host network 10.3 Docker bridge network 8.93 0.78 Docker proxy 7.37 0.71 Docker overlay network 7.04 0.68

 

In the particular case of my customer where SQL Server instances sit on the Docker infrastructure and applications reside outside of it, it’s clear that using directly Docker host network may be a good option from a performance standpoint assuming this infrastructure remains simple with few SQL Server containers. But in this case, we have to change the SQL Server default listen port with MSSQL_TCP_PORT parameter because using Docker host networking doesn’t provide port mapping capabilities. According to our tests, we didn’t get any evidence of performance improvement in terms of application response time between Docker network drivers but probably because those applications are not network bound here. But I may imagine scenarios where it can be. Finally, this kind of scenario encountered here is likely uncommon and I see containerized apps with database components outside the Docker infrastructure more often but it doesn’t change the game at all and the same considerations apply here … Today I’m very curious to test real microservices scenarios where database and application components are all sitting on a Docker infrastructure.

See you!

 

Cet article SQL Server containers and docker network driver performance considerations est apparu en premier sur Blog dbi services.

Using DbVisualizer to work with #Oracle, #PostgreSQL and #Exasol

The Oracle Instructor - Tue, 2019-07-02 09:01

As a Database Developer or Database Administrator, it becomes increasingly unlikely that you will work with only one platform.

It’s quite useful to have one single tool to handle multiple different database platforms. And that’s exactly the ambition of DbVisualizer.

As a hypothecial scenario, let’s assume you are a database admin who works on a project to migrate from Oracle to EDB Postgres and Exasol.

The goal might be to replace the corporate Oracle database landscape, moving the OLTP part to EDB Postgres and the DWH / Analytics part to Exasol.

Instead of having to switch constantly between say SQL Developer, psql and EXAplus, a more efficient approach would be using DbVisualizer for all three.

I created one connection for each of the three databases here for my demo:Now let’s see if statements I do in Oracle also work in EDB Postgres and in Exasol:

Oracle

EDB

Exasol

Works the same for all three! The convenient thing here is that I just had to select the Database Connection from the pull down menu while leaving the statement as it is. No need to copy & paste even.

What about schemas and tables?

Oracle

In EDB, I need to create a schema accordingly:

EDB

 

In Exasol, schema and table can be created in the same way:

Exasol

Notice that the data types got silently translated into the proper Exasol data types:

Exasol

There is no DBA_TABLES in Exasol, though:

Exasol

Of course, there’s much more to check and test upon migration, but I think you got an idea how a universal SQL Client like DbVisualizer might help for such purposes.

 

Categories: DBA Blogs

opt_estimate 4

Jonathan Lewis - Mon, 2019-07-01 07:18

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

Modifying pg_hba.conf from inside PostgreSQL

Yann Neuhaus - Sat, 2019-06-29 07:14

During one of the sessions from the last Swiss PGDay there was a question which could not be answered during the talk: Is it possible to modify pg_hba.conf from inside PostgreSQL without having access to the operating system? What everybody agreed on is, that there currently is no build-in function for doing this.

When you are on a recent version of PostgreSQL there is a view you can use to display the rules in pg_hba.conf:

postgres=# select * from pg_hba_file_rules ;
 line_number | type  |   database    | user_name |  address  |                 netmask                 | auth_method | options | error 
-------------+-------+---------------+-----------+-----------+-----------------------------------------+-------------+---------+-------
          84 | local | {all}         | {all}     |           |                                         | trust       |         | 
          86 | host  | {all}         | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          88 | host  | {all}         | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          91 | local | {replication} | {all}     |           |                                         | trust       |         | 
          92 | host  | {replication} | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          93 | host  | {replication} | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          94 | host  | {all}         | {mydb}    | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
(7 rows)

But there is nothing which allows you to directly modify that. When you are lucky and you have enough permissions there is a way to do it, though. First, lets check where pg_hba.conf is located:

postgres=# select setting from pg_settings where name like '%hba%';
           setting           
-----------------------------
 /u02/pgdata/DEV/pg_hba.conf

Having that information we can load that file to a table:

postgres=# create table hba ( lines text ); 
CREATE TABLE
postgres=# copy hba from '/u02/pgdata/DEV/pg_hba.conf';
COPY 93

Once it is loaded we have the whole content in our table (skipping the comments and empty lines here):

postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
(6 rows)

As this is a normal table we can of course add a row:

postgres=# insert into hba (lines) values ('host  all mydb  ::1/128                 trust');
INSERT 0 1
postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
 host  all mydb  ::1/128                 trust
(7 rows)

And now we can write it back:

postgres=# copy hba to '/u02/pgdata/DEV/pg_hba.conf';
COPY 94

Reading the whole file confirms that our new rule is there:

postgres=# select pg_read_file('pg_hba.conf');
                               pg_read_file                               
--------------------------------------------------------------------------
 # PostgreSQL Client Authentication Configuration File                   +
 # ===================================================                   +
 #                                                                       +
 # Refer to the "Client Authentication" section in the PostgreSQL        +
 # documentation for a complete description of this file.  A short       +
 # synopsis follows.                                                     +
 #                                                                       +
 # This file controls: which hosts are allowed to connect, how clients   +
 # are authenticated, which PostgreSQL user names they can use, which    +
 # databases they can access.  Records take one of these forms:          +
 #                                                                       +
 # local      DATABASE  USER  METHOD  [OPTIONS]                          +
 # host       DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostssl    DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostnossl  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 #                                                                       +
 # (The uppercase items must be replaced by actual values.)              +
 #                                                                       +
 # The first field is the connection type: "local" is a Unix-domain      +
 # socket, "host" is either a plain or SSL-encrypted TCP/IP socket,      +
 # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a     +
 # plain TCP/IP socket.                                                  +
 #                                                                       +
 # DATABASE can be "all", "sameuser", "samerole", "replication", a       +
 # database name, or a comma-separated list thereof. The "all"           +
 # keyword does not match "replication". Access to replication           +
 # must be enabled in a separate record (see example below).             +
 #                                                                       +
 # USER can be "all", a user name, a group name prefixed with "+", or a  +
 # comma-separated list thereof.  In both the DATABASE and USER fields   +
 # you can also write a file name prefixed with "@" to include names     +
 # from a separate file.                                                 +
 #                                                                       +
 # ADDRESS specifies the set of hosts the record matches.  It can be a   +
 # host name, or it is made up of an IP address and a CIDR mask that is  +
 # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that     +
 # specifies the number of significant bits in the mask.  A host name    +
 # that starts with a dot (.) matches a suffix of the actual host name.  +
 # Alternatively, you can write an IP address and netmask in separate    +
 # columns to specify the set of hosts.  Instead of a CIDR-address, you  +
 # can write "samehost" to match any of the server's own IP addresses,   +
 # or "samenet" to match any address in any subnet that the server is    +
 # directly connected to.                                                +
 #                                                                       +
 # METHOD can be "trust", "reject", "md5", "password", "scram-sha-256",  +
 # "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert".    +
 # Note that "password" sends passwords in clear text; "md5" or          +
 # "scram-sha-256" are preferred since they send encrypted passwords.    +
 #                                                                       +
 # OPTIONS are a set of options for the authentication in the format     +
 # NAME=VALUE.  The available options depend on the different            +
 # authentication methods -- refer to the "Client Authentication"        +
 # section in the documentation for a list of which options are          +
 # available for which authentication methods.                           +
 #                                                                       +
 # Database and user names containing spaces, commas, quotes and other   +
 # special characters must be quoted.  Quoting one of the keywords       +
 # "all", "sameuser", "samerole" or "replication" makes the name lose    +
 # its special character, and just match a database or username with     +
 # that name.                                                            +
 #                                                                       +
 # This file is read on server startup and when the server receives a    +
 # SIGHUP signal.  If you edit the file on a running system, you have to +
 # SIGHUP the server for the changes to take effect, run "pg_ctl reload",+
 # or execute "SELECT pg_reload_conf()".                                 +
 #                                                                       +
 # Put your actual configuration here                                    +
 # ----------------------------------                                    +
 #                                                                       +
 # If you want to allow non-local connections, you need to add more      +
 # "host" records.  In that case you will also need to make PostgreSQL   +
 # listen on a non-local interface via the listen_addresses              +
 # configuration parameter, or via the -i or -h command line switches.   +
                                                                         +
 # CAUTION: Configuring the system for local "trust" authentication      +
 # allows any local user to connect as any PostgreSQL user, including    +
 # the database superuser.  If you do not trust all your local users,    +
 # use another authentication method.                                    +
                                                                         +
                                                                         +
 # TYPE  DATABASE        USER            ADDRESS                 METHOD  +
                                                                         +
 # "local" is for Unix domain socket connections only                    +
 local   all             all                                     trust   +
 # IPv4 local connections:                                               +
 host    all             all             127.0.0.1/32            trust   +
 # IPv6 local connections:                                               +
 host    all             all             ::1/128                 trust   +
 # Allow replication connections from localhost, by a user with the      +
 # replication privilege.                                                +
 local   replication     all                                     trust   +
 host    replication     all             127.0.0.1/32            trust   +
 host    replication     all             ::1/128                 trust   +
 host  all mydb  ::1/128                 trust                           +
(1 row)

All you need to do from now on is to reload the configuration and you’re done:

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Of course: Use with caution!

Cet article Modifying pg_hba.conf from inside PostgreSQL est apparu en premier sur Blog dbi services.

opt_estimate 3

Jonathan Lewis - Fri, 2019-06-28 07:12

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

Should the Oracle APEX Community Care About Autonomous Database?

Joel Kallman - Fri, 2019-06-28 06:14


This past week, Oracle announced the availability of Oracle APEX, SQL Developer Web and Oracle REST Data Services on Oracle Autonomous Database.  If you're in the APEX community, should you care?  I say "absolutely yes!", but not for the reasons you might suspect.

Autonomous Database is strategic to Oracle.  Just read the transcript from the recent Oracle quarterly earnings conference call and it will be obvious to you.  Autonomous is an advancement in technology that has significant investment from Oracle and very real benefits for customers.  It's a clear market differentiator - I do truly believe this, it's not merely my marketing spin.  And now, with the addition of Oracle APEX & SQL Developer Web & Oracle REST Data Services, I think this combination of technologies provides even more capabilities to this platform and even greater differentiation.  What other service provides elastic, autonomous capabilities, application design and proven low code application development, out-of-the-box?  Did I mention that this also happens to include the world's most popular database, Oracle Database?

The benefits of low code application development are real.  And Low Code + Autonomous Database is the ideal combination.  Low code is about reducing costs, delivering faster, with greater consistency, and being usable by a broader range of skill sets.  Some of the benefits of Autonomous Database are equivalent - less cost, instant availability, usable by others who may not be world-class experts.  It has been a long multi-year confluence of events that has brought us together here.

The APEX community is the envy of others at Oracle.  Even people who aren't APEX fans recognize the APEX community's passion.  But where did this come from?  Do people really get excited about a tool?  No.  They get excited about what they can do with a tool - how it helps them deliver a solution, and be successful.  A carpenter doesn't get passionate about his dual-slide compound miter saw because it's a cool tool.  He gets satisfaction about what he can actually do with that tool versus a hand saw.  When you get a pay raise or praise or a promotion because of what you've been able to deliver with APEX and ORDS and Oracle Database, that's a reason to get excited!  And I think that is ultimately the real story behind the enviable, tangible energy in the APEX community.  Countless people have had many great successes with this combination of technologies, and success begets success.

Let's say you're in the APEX community, you saw this announcement about APEX on Autonomous, but you're not interested in cloud.  Or, as Andre de Souza so eloquently stated on Twitter, "I know it’s big news, just does not affect 99,9% of current #orclapex developers I’m guessing."  Should you care?  I say yes, and here's why.  The great APEX community that I mention above, which has been so successful with APEX & ORDS & Oracle Database over the years, has become very large across the globe, and with not a lot of help from Oracle.  Make no mistake - Oracle does invest in APEX, millions of dollars every year.  But I still come across Oracle Database customers who have simply never heard of APEX.  This is because there has not been much promotion from Oracle marketing or public relations or even sales.  All of this is about to change.  Why?  Because APEX is on Autonomous Database, and Autonomous Database is strategic to Oracle.  You will probably see more communication and discussion from Oracle about APEX than probably the last 20 years combined.  Low code resonates with customers, APEX is proven, and everyone has application development needs.

How does this benefit someone in the APEX community?  Simple:

  1. Awareness and interest will rise by people who have never heard about APEX before, both existing on-premises customers and net new customers.
  2. There will be greater demand for APEX and database development talent.  If you have experience with APEX, with a proven track record of delivering solutions with APEX, you're a very attractive person.  Perhaps the rate you charge has now gotten a bit higher.  You'll certainly gain upward mobility.
  3. You'll no longer have to introduce someone to APEX for the very first time, or counter the claim that "it's not strategic."
  4. As our friends from Explorer UK say, with APEX, they "develop cloud ready applications".  And you've been doing this for years.  Don't be afraid to make this claim.  When and if you're ready for cloud, you're already out of the gate.  The same APEX apps you developed on-premises run and look and feel exactly the same in the cloud.  Who has been developing cloud-ready apps for years?  You!

So.  Even if you're not into "cloud" but into APEX, this announcement and these capabilities on Autonomous Database has material impact on you and everyone else in the APEX community.  Your skills and experience will become more valued, and we should expect the market and interest and demand to grow.

Everything is not perfect, and we on the APEX team still have a lot of very hard work ahead of us.  But these are exciting times and it's what we've labored on for the past 20 years, to get to this point.  For those who have been with the APEX community for so many years, congratulations!  You've bet on the right horse.  Just fasten your seat belt.

Windocks and K8s support

Yann Neuhaus - Fri, 2019-06-28 00:33

I got recently the 4.08 update from the Windocks team and I was very excited to evaluate some of new features. The first cool one I want to present in this blog concerns the Kubernetes support for deploying Windocks containers that will make my application deployment definitely easier. Let’s say you want to deploy your application that is tied to a Windocks container for SQL Server. In a previous blog post I explained why we are using Windocks in our context. So, with previous versions of Windocks, we had to write custom scripts to deploy applications on K8s that are tied to a Windocks. With the new version 4.08, this process may be simplified because both of applications and their related Windocks containers are directly deployable on K8s by using a YAML deployment file.

In fact, the new way consists in deploying a Windocks SQL Server proxy on K8s that works in conjunction with a Windocks Server. Once the SQL Server proxy deployed a corresponding Windocks container is spinning up with their specific parameters as shown in the picture below:

 

First of all, in order to make access secure between K8s and the Windocks Server authentication is required and we need to provide credential information that will be stored in the sql-proxy secret in K8s. SA password is also included in this secret and will be used to setup the SA account when the Windocks container will spin up.

$ kubectl create secret generic proxy-secrets --from-literal=WINDOCKS_REQUIRED_USERNAME='clustadmin' --from-literal=WINDOCKS_REQUIRED_PASSWORD='StrongPassword' --from-literal=WINDOCKS_REQUIRED_CONTAINER_SAPASSWORD=’sa_password'

 

The next step consists in deploying the Windocks SQL proxy by with the specific environment variables including WINDOCKS_REQUIRED_HOSTNAME (Windocks server name or IP Address), WINDOCKS_REQUIRED_IMAGE_NAME (Windocks based image used for container) and WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT (optional).

  • The Windocks SQL Proxy YAML file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: windocks-sql-proxy-secure 
  labels:
    app: sqlproxy-secure 
spec:
  replicas: 1 
  template:
    metadata:
      labels:
        app: sqlproxy-secure 
        tier: frontend
    spec:
      containers:
      - name: sqlproxy-secure-app 
        image: windocks/windocks-sql-server-proxy 
        imagePullPolicy: Always
        ports:
        - name: tcp-proxy
          containerPort: 3087
        - name: tls-proxy
          containerPort: 3088
        envFrom:
          - secretRef:
              name: proxy-secrets
        env:
          - name: PROJECT_ID
            value: project_id_for_GKE_deployment_optional
          - name: WINDOCKS_REQUIRED_HOSTNAME
            value: xx.xxx.xxx.xxx
          - name: WINDOCKS_REQUIRED_IMAGE_NAME
            value: 2012_ci
          - name: WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT
            value: "3087"

 

If we want to make the SQL Proxy pod accessible from outside a service is needed but this is not mandatory according to the context. Note that you may also use TLS connection to secure the network between K8s and the Windocks server.

  • The Windocks service YAML file
apiVersion: v1
kind: Service
metadata:
  name: windocks-sql-proxy-secure
  labels:
    app: sqlproxy-secure
    tier: frontend
spec:
  sessionAffinity: ClientIP
  type: LoadBalancer
  ports:
  - port: 3087
    name: tcp-proxy-secure-service
    targetPort: 3087
  - port: 3088
    name: tls-proxy-secure-service
    targetPort: 3088
  selector:
    app: sqlproxy-secure
    tier: frontend

 

Let’s give a try on my Azure infrastructure including an AKS cluster and a Windocks Server installed in an Azure VM. I also took the opportunity to create my own helm chart from the YAML files provided by the Windocks team. It will make my deployment easier for sure. Here the command I used to deploy my Windocks helm chart on my AKS cluster.

$ helm install --name windocks2012 --namespace dmk --set Windocks.Image=2012_ci --set Windocks.Port=3089 --set Windocks.PortSSL=3090 .

 

Deployment will be performed in a specific namespace named dmk and the 2012_ci image will be used as based image for my Windocks container. I will be able to connect to my Windocks container by using the 3089 port through the SQL Proxy deployed on K8s. After few seconds the following resources were deployed within my dmk namespace including a Windocks SQL Proxy pod and the Windocks SQL Proxy service.

$ kubectl get all -n dmk
NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fb8694m   1/1     Running   0          13m

NAME                                                            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)
                 AGE
service/backend                                                 ClusterIP      10.0.126.154   <none>          80/TCP
                 8d
service/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   LoadBalancer   10.0.252.235   xx.xx.xxx.xxx   3089:30382/TCP,3090:30677/TCP   44m

NAME                                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   1         1         1            1           44m

NAME                                                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fbdb5c96   1         1         1       44m

 

Once deployed, the SQL proxy will redirect all connections from 3089 port to the container port after spinning up the corresponding Windocks container on the Windocks server. We may get some details by taking a look at the SQL Proxy logs on K8s. As a reminder the container port is allocated dynamically by default by the Windocks server and the SQL proxy get it automatically for connection redirection.

…
Valid response for creating Windocks container
Container id is b1201aaaba3b4cd047953b624e541e26500024e42e6381936fc7b526b5596a99
Container port is 10001
Setting up tcp server
redirecting connections from 127.0.0.1:3089 to xx.xxx.xxx.xxx:10001 
…

 

Let’s try to connect by using mssql-cli and the external IP of the SQL Proxy service and the 3089 port. The connection redirect is effective and I can interact with my Windocks container on local port 10001:

master> SELECT top 1 c.local_net_address, c.local_tcp_port
....... FROM sys.dm_exec_connections as c; 
+---------------------+------------------+
| local_net_address   | local_tcp_port   |
|---------------------+------------------|
| 172.18.0.5          | 10001            |
+---------------------+------------------+

 

The Windocks container for SQL Server was spinning up my 3 testing databases as expected:

master> \ld+
+-------------------+-------------------------+-----------------------+------------------------------+
| name              | create_date             | compatibility_level   | collation_name               |
|-------------------+-------------------------+-----------------------+------------------------------|
| master            | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| tempdb            | 2019-06-27 20:04:04.273 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| model             | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| msdb              | 2012-02-10 21:02:17.770 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| AdventureWorksDbi | 2019-06-27 20:04:03.537 | 100                   | Latin1_General_100_CS_AS     |
| ApplixEnterprise  | 2019-06-27 20:04:04.477 | 90                    | SQL_Latin1_General_CP1_CI_AS |
| dbi_tools         | 2019-06-27 20:04:05.153 | 100                   | French_CS_AS                 |
+-------------------+-------------------------+-----------------------+------------------------------+

 

From the Windocks server, I may get a picture of provisioned containers. The interesting one in our case is referenced by the name k8s-windocks2012/xxxx:

PS F:\WINDOCKS\SQL2012> docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
e9dbe5556b2f        2012_ci             ""                  29 minutes ago      Stopped             10002/              dab/Windocks-id:31432367-c744-4ae3-8248-cb3fb3d2792e
b1201aaaba3b        2012_ci             ""                  13 minutes ago      Started             10001/              k8s-windocks2012/Windocks-id:cfa58c38-d168-4c04-b4c8-12b0552b93ad

 

Well, in a nutshell a feature we will consider to integrate in our DevOps Azure pipeline for sure. Stay tuned, other blog posts will come later.

See you!

 

 

 

Cet article Windocks and K8s support est apparu en premier sur Blog dbi services.

work Agile in a GxP-regulated environment

Yann Neuhaus - Thu, 2019-06-27 09:24

On 4 June 2019 I followed an invitation to

wega-it’s Know-how & Networking Breakfast 2 2019 on “Agile Validation in GxP Projects”.

So they were to discuss Agility in the context of GxP regulation.

I had some earlier exposure to various kinds of compliance roles and topics, and my current work environment is in the highly regulated Pharma industry. So I was really wondering (and hoping for learning) how you can possibly bring the two points of view of Agility and GxP regulation together. The Literaturhaus Basel was to see some very special type of literature that day. Not a formal presentation but a role play performance between these two viewpoints, represented by Evelyne Daniel, an experienced GxP validation expert, and Mathias Fuchs, an equally experienced Scrum master, both from wega-IT. A very nice idea, very appropriate for the topic!

What is Compliance (GxP) and what is Agile?

Typically in GxP compliance we work along the so-called V-model. In its rigidness and wanted plannability of course it corresponds largely to the waterfall model of traditional software development. Opposed to this the Agile Manifesto (2001) criticizes the very basics of these traditional ways of working. Remember the iconic claims like “Working software over comprehensive Documentation” and “Responding to change over following a Plan”. But how would you ever get acceptance in the compliance world without full documentation and planning!?! When I quickly browsed the internet, I found a multitude of statements and proposals which would certainly merit a separate blog post. For this time, I will give a quick summary of the wega breakfast presentation and some touch points with our current working environment in the Pharma industry.

Although in my current work environment we are not actually driving GxP Validation projects, we are still subject to the very tight GxP regulation. In the processes of Change and Release Management, this is reflected in the rigid rules of testing and documentation, to just name the most obvious examples. Background, of course, is the definition of Compliance and its Validation: the goal is to “establish documented evidence” to assure compliance and quality etc. These requirements hold independently of the quality, completeness or even up-to-date status of the pre-defined processes and rules! Call this inflexible and cumbersome! Any adaptation (update!) of the processes and rules is very formal through the complicated administrative processes to be used and hence very slow. Consider this in our fast-moving (not only IT-) world!

What is an MVP?

A nice play of words was interjected in the stage discussion: the acronym MVP has a very clear meaning as a basic concept for both sides, just it is not the same: MVP = Master Validation Plan (in GxP Validation) versus Minimal Viable Product (in Agile or Lean Software Development).

How to bring them together?

Now how to bring the core aspirations of Agile Development like Customer focus, Flexibility, Speed into the Compliance world? A first inevitable step in the V-model world: break up the (dead?) lock between a complete finalization of User Requirements Specification and the setup of a complete Validation Plan prescribing all Qualification criteria (IQ, OQ, PQ). Definition of Done (DoD) plays a major role when trying to cut the end-to-end Development-Validation elephant into smaller pieces. Inclusion of Validation into the “daily” Development activities is another must, instead of adding Validation at the end of Development phases only. Yet another core principle from the Agile side is the ensurance of team Maturity and Mindset. Much-hailed Diversity is opposed to pure compliance-oriented expert teams, striving for innovation and creativity in the team.

WEGA breakfast - Agile Validation in GxP projects

Some basic approaches

The final answer on how to – methodically – combine or maybe rather “emulsify” Agility and Compliance Validation comes as no surprise: there is no one-size-fits-all method. Rather three obvious basic approaches were presented.

  1. introducing Agility right between the left (Specifications) and the right (Qualifications) arms of the V-model, probably using some kind of piloting or prototyping
  2. including Validation into the Agile Development, almost doing Validation in each Agile sprint
  3. appending V-model Validation at the end of an Agile development.

The above-mentioned end-to-end Development-to-Validation elephant has to be broken into smaller better manageable units. Each specific project situation will have its own possible and best way to do it.

Think innovative and creative!

Thanks to wega-informatik (www.wega-it.com)  for organizing this creative and informative event.

 

Cet article work Agile in a GxP-regulated environment est apparu en premier sur Blog dbi services.

Glitches

Jonathan Lewis - Wed, 2019-06-26 11:11

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.

 

Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles

Yann Neuhaus - Wed, 2019-06-26 10:23

Pour une fois un blog en français parce que concernant un événement Oracle en Français.
Aujourd’hui chez Oracle Suisse à Genève, il y’avait une présentation concernant la sécurité Oracle.
Il s’agissait de se mettre à la place d’un Hacker pour mieux appréhender leurs démarches et ensuite comment protéger les données dans une base de données Oracle.
Comprendre les intentions et moyens utilisés par les Hackers pour parvenir à leurs fins nous aide à mieux les combattre.

La séance était animée par Hakim Loumi – EMEA DB Security PM
Dans un premier temps le conférencier a montré une estimation de l’accroissement des données d’ici 2025. Waw 175 Zb

Et comment ces données pouvaient être sensitives

Impressionnant le nombre d’attaques dans le monde

Hakim a aussi montré pourquoi le Hacking était devenu un de fléaux les plus importants. Dans ce slide ci-dessous pour un investissement de moins de moins de 10$, une identité complète revenait en moyenne à 240$. Quelle rentabilité !!!!!

Evidemment la base de données est une cible principale des attaques

Le conférencier a ensuite présenté les principaux outils fournis par Oracle, principalement nécessitant l’option Advanced Security Option

Data Masking
Data Redaction
Oracle Firewall
Database vault

Et pour terminer sur ce joli slide

Conclusion

L’évenement étatit vraiment intéressant. Le conférencier par des exemples, des anecdotes simples a su capter l’attention du public. On retiendra surtout dans cette présentation que la sécurité des données, n’est pas que l’affaire du DBA. En effet c’est toute une chaines de procédures incluant tout le monde dans l’entreprise.

Cet article Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles est apparu en premier sur Blog dbi services.

Oracle Developer Tools - Do They Still Exist?

Andrejus Baranovski - Wed, 2019-06-26 01:55
People are frustrated about @OracleADF @JDeveloper on social media - "ADF boat has no captain", etc. I agree @Oracle is to blame big time for such lame handling of its own Developer Tools stack. @Oracle please wake up and spend some budget on @OracleADF. Read more:

Oracle VBCS - right now this tool gets the most of Oracle focus. Supposed to offer declarative #JavaScript development experience in the Cloud. Not well received by the community. Are there any VBCS customers, please respond if yes?

Oracle APEX - comes with a very strong community (mostly backed by DB folks). But is not strategic for Oracle. More likely will be used by PL/SQL guys then by Java or Web developers. 

Oracle JET - highly promoted by Oracle. Set of opensource #JavaScript libs, glued by Oracle layer. Nice, but can't be used as a direct replacement for @OracleADF, JET is UI layer only. Oracle folks often confuse community by saying - Oracle JET is a great option to replace ADF

Oracle Forms - still alive, but obviously can't be strategic Oracle platform. A few years ago, Oracle was promoting Forms modernization to @OracleADF

Summary - Oracle Developer tools offering is weak. Lack of Oracle investment into development tools - makes Oracle developers community shrink.

opt_estimate 2

Jonathan Lewis - Tue, 2019-06-25 14:22

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t affect the arithmetic but it does mean the appearance of the nested loop goes back to the original pre-9i form that happens to make it a little easier to see costs and cardinalities adding and multiplying their way through the plan.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2: the optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15*15 = 255 and the cost comes to 15 (t1 rows) * 16 (t2 unit cost) + 2 (t1 cost) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name nlj_index_scan, table_alias (list of possible driving tables), target_index, numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is 15 (t1 rows) * 2 (t2 unit cost) + 2 (t1 cost).

Let’s try to adjust the predication that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decode to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality in operation 1 has also dropped from 225 to 47 – if the optimizer is told that it’s cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins.

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted because desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

SQLcl ALIAS – because you can’t remember everything.

The Anti-Kyte - Tue, 2019-06-25 08:47

I want to find out which file is going to hold any trace information generated by my database session. Unfortunately, I keep forgetting the query that I need to run to find out.
Fortunately I’m using SQLcl, which includes the ALIAS command.
What follows is a quick run-through of this command including :

  • listing the aliases that are already set up in SQLcl
  • displaying the code that an alias will execute
  • creating your own alias interactively
  • deleting an alias
  • using files to manage custom aliases

Whilst I’m at it, I’ll create the alias for the code to find that pesky trace file too.

In the examples that follow, I’m connected to an Oracle XE18c PDB using SQLcl 18.4 from my Ubuntu 16.4 LTS laptop via the Oracle Thin Client. Oh, and the Java details are :

Meet the ALIAS command

As so often in SQLcl, it’s probably a good idea to start with the help :

help alias

…which explains that :

“Alias is a command which allows you to save a sql, plsql or sqlplus script and assign it a shortcut command.”

A number of aliases are already included in SQLcl. To get a list of them simply type :

alias

…which returns :

locks
sessions
tables
tables2

If we want to see the code that will be run when an alias is invoked, we simply need to list the alias :

alias list tables

tables - tables <schema> - show tables from schema
--------------------------------------------------

select table_name "TABLES" from user_tables

Connected as HR, I can run the alias to return a list of tables that I own in the database :

Creating an ALIAS

To create an alias of my own, I simply need to specify the alias name and the statement I want to associate it with. For example, to create an alias called whoami :

alias whoami =
select sys_context('userenv', 'session_user')
from dual;

I can now confirm that the alias has been created :

alias list whoami
whoami
------

select sys_context('userenv', 'session_user')
from dual

…and run it…

I think I want to tidy up that column heading. I could do this by adding an alias in the query itself. However, alias does support the use of SQL*Plus commands…

alias whoami =
column session_user format a30
select sys_context('userenv', 'session_user') session_user
from dual;

…which can make the output look slightly more elegant :

A point to note here is that, whilst it is possible to include SQL*Plus statements in an alias for a PL/SQL block (well, sort of)…

alias whoami=set serverout on
exec dbms_output.put_line(sys_context('userenv', 'session_user'));

…when the alias starts with a SQL*Plus statement, it will terminate at the first semi-colon…

Where you do have a PL/SQL alias that contains multiple statement terminators (‘;’) you will need to run any SQL*Plus commands required prior to invoking it.
Of course, if you find setting output on to be a bit onerous, you can save valuable typing molecules by simply running :

alias output_on = set serverout on size unlimited

I can also add a description to my alias so that there is some documentation when it’s listed :

alias desc whoami The current session user

When I now list the alias, the description is included…more-or-less…

I’m not sure if the inclusion of the text desc whoami is simply a quirk of the version and os that I’m running on. In any case, we’ll come to a workaround for this minor annoyance in due course.

In the meantime, I’ve decided that I don’t need this alias anymore. To remove it, I simply need to run the alias drop command :

alias drop whoami


At this point, I know enough about the alias command to implement my first version of the session tracefile alias that started all this.
The query, that I keep forgetting, is :

select value
from v$diag_info
where name = 'Default Trace File'
/

To create the new alias :

alias tracefile =
select value "Session Trace File"
from v$diag_info
where name = 'Default Trace File';

I’ll also add a comment at this point :

alias desc tracefile The full path and filename on the database server of the tracefile for this session

My new alias looks like this :

The aliases.xml file

Unlike the pre-supplied aliases, the code for any alias you create will be held in a file called aliases.xml.

On Windows, this file will probably be somewhere under your OS user’s AppData directory.
On Ubuntu, it’s in $HOME/.sqlcl

With no custom aliases defined the file looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases/>

Note that, even though I have now defined a custom alias, it won’t be included in this file until I end the SQLcl session in which it was created.

Once I disconnect from this session, the file includes the new alias definition :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases>
<alias name="tracefile">
<description><![CDATA[desc tracefile The full path and filename on the database server of the tracefile for this session
]]></description>
<queries>
<query>
<sql><![CDATA[select value "Session Trace File"
from v$diag_info
where name = 'Default Trace File']]></sql>
</query>
</queries>
</alias>
</aliases>

Incidentally, if you’ve played around with SQLDeveloper extensions, you may find this file structure rather familiar.

The file appears to be read by SQLcl once on startup. Therefore, before I run SQLcl again, I can tweak the description of my alias to remove the extraneous text…

<description><![CDATA[The full path and filename on the database server of the tracefile for this session]]></description>

Sure enough, next time I start an SQLcl session, this change is now reflected in the alias definition :

Loading an alias from a file

The structure of the aliases.xml file gives us a template we can use to define an alias in the comfort of a text editor rather than on the command line. For example, we have the following PL/SQL block, which reads a bind variable :

declare
v_msg varchar2(100);
begin
if upper(:mood) = 'BAD' then
if to_char(sysdate, 'DAY') != 'MONDAY' then
v_msg := q'[At least it's not Monday!]';
elsif to_number(to_char(sysdate, 'HH24MI')) > 1200 then
v_msg := q'[At least it's not Monday morning!]';
else
v_msg := q'[I'm not surprised. It's Monday morning !]';
end if;
elsif upper(:mood) = 'GOOD' then
v_msg := q'[Don't tell me West Ham actually won ?!]';
else
v_msg := q'[I'm just a simple PL/SQL block and I can't handle complex emotions, OK ?!]';
end if;
dbms_output.new_line;
dbms_output.put_line(v_msg);
end;
/

Rather than typing this in on the command line, we can create a file ( called pep_talk.xml) which looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases>
<alias name="pep_talk">
<description><![CDATA[How are you feeling ? Usage is pep_talk <emotion>]]></description>
<queries>
<query>
<sql><![CDATA[
declare
v_msg varchar2(100);
begin
if upper(:mood) = 'BAD' then
if to_char(sysdate, 'DAY') != 'MONDAY' then
v_msg := q'[At least it's not Monday!]';
elsif to_number(to_char(sysdate, 'HH24MI')) > 1200 then
v_msg := q'[At least it's not Monday morning!]';
else
v_msg := q'[I'm not surprised. It's Monday morning !]';
end if;
elsif upper(:mood) = 'GOOD' then
v_msg := q'[Don't tell me West Ham actually won ?!]';
else
v_msg := q'[I'm just a simple PL/SQL block and I can't handle complex emotions, OK ?!]';
end if;
dbms_output.new_line;
dbms_output.put_line(v_msg);
end;
]]></sql>
</query>
</queries>
</alias>
</aliases>

Now, we can load this alias from the file as follows :

alias load pep_talk.xml
Aliases loaded

We can now execute our new alias. First though, we need to remember to turn serveroutput on before we invoke it :

Once you’ve terminated your SQLcl session, the new alias will be written to aliases.xml.

Exporting custom aliases

There may come a time when you want to share your custom aliases with your colleagues. After all, it’s always useful to know where the trace file is and who doesn’t need a pep talk from time-to-time ?

To “export” your aliases, you can issue the following command from SQLcl :

alias save mike_aliases.xml

This writes the file to the same location as your aliases.xml :

You can then import these aliases to another SQLcl installation simply by sharing the file and then using the alias load command.

References

As you can imagine, there are a wide variety of possible uses for the ALIAS command.

As the original author of this feature, this post by Kris Rice is probably worth a read.
Jeff Smith has written on this topic several times including :

Menno Hoogendijk has an example which employs some Javascript wizardry which he has published on GitHub.

Right, back to my trace files.

ANSI bug

Jonathan Lewis - Sat, 2019-06-22 07:01

The following note is about a script that I found on my laptop while I was searching for some details about a bug that appears when you write SQL using the ANSI style format rather than traditional Oracle style. The script is clearly one that I must have cut and pasted from somewhere (possibly the OTN/ODC database forum) many years ago without making any notes about its source or resolution. All I can say about it is that the file has a creation date of July 2012 and I can’t find any reference to a problem through Google searches – though the tables and even a set of specific insert statements appears in a number of pages that look like coursework for computer studies and MoS has a similar looking bug “fixed in 11.2”.

Here’s the entire script:

rem
rem     Script:         ansi_bug.sql
rem     Author:         ???
rem     Dated:          July 2012
rem

CREATE TABLE Student (
  sid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  address VARCHAR(20) NOT NULL,
  major CHAR(2)
);

CREATE TABLE Professor (
  pid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  department VARCHAR(10) NOT NULL
);

CREATE TABLE Course (
  cid INT PRIMARY KEY,
  title VARCHAR(20) NOT NULL UNIQUE,
  credits INT NOT NULL,
  area VARCHAR(5) NOT NULL
);

CREATE TABLE Transcript (
  sid INT,
  cid INT,
  pid INT,
  semester VARCHAR(9),
  year CHAR(4),
  grade CHAR(1) NOT NULL,
  PRIMARY KEY (sid, cid, pid, semester, year),
  FOREIGN KEY (sid) REFERENCES Student (sid),
  FOREIGN KEY (cid) REFERENCES Course (cid),
  FOREIGN KEY (pid) REFERENCES Professor (pid)
);

INSERT INTO Student (sid, name, address, major) VALUES (101, 'Nathan', 'Edinburg', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (105, 'Hussein', 'Edinburg', 'IT');
INSERT INTO Student (sid, name, address, major) VALUES (103, 'Jose', 'McAllen', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (102, 'Wendy', 'Mission', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (104, 'Maria', 'Pharr', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (106, 'Mike', 'Edinburg', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (107, 'Lily', 'McAllen', NULL);

INSERT INTO Professor (pid, name, department) VALUES (201, 'Artem', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (203, 'John', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (202, 'Virgil', 'MATH');
INSERT INTO Professor (pid, name, department) VALUES (204, 'Pearl', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (205, 'Christine', 'CS');

INSERT INTO Course (cid, title, credits, area) VALUES (4333, 'Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (1201, 'Comp literacy', 2, 'INTRO');
INSERT INTO Course (cid, title, credits, area) VALUES (6333, 'Advanced Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (6315, 'Applied Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (3326, 'Java', 3, 'PL');
INSERT INTO Course (cid, title, credits, area) VALUES (1370, 'CS I', 4, 'INTRO');

INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 4333, 201, 'Spring', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6315, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 4333, 203, 'Summer I', '2010', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (102, 4333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 3326, 204, 'Spring', '2008', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1201, 205, 'Fall', '2009', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1370, 203, 'Summer II', '2010', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1201, 205, 'Fall', '2009', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1370, 203, 'Summer II', '2010', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 3326, 204, 'Spring', '2001', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 6315, 203, 'Fall', '2008', 'A');

SELECT 
        pid, 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
order by pid
;

I’ve run three minor variations of the same query – the one in the middle selects two columns from a three table join using natural joins. The first query does the same but includes an extra column in the select list while the third query selects only the original columns but orders the result set by the extra column.

The middle query returns 60 rows – the first and third, with the “extra” column projected somewhere in the execution plan, return 13 rows.

I didn’t even have a note of the then-current version of Oracle when I copied this script, but I’ve just run it on 12.2.0.1, 18.3.0.0, and 19.2.0.0 (using LiveSQL), and the error reproduces on all three versions.

Ubuntu Server: How to activate kernel dumps

Dietrich Schroff - Fri, 2019-06-21 14:25
If you are running ubuntu server, you can add kdump on your system to write kernel dumps in case of sudden reboots etc.

Installing is very easy:
root@ubuntuserver:/etc# apt install linux-crashdump
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu crash kdump-tools kexec-tools libbinutils libdw1 libsnappy1v5 makedumpfile
Suggested packages:
  binutils-doc
The following NEW packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu crash kdump-tools kexec-tools libbinutils libdw1 libsnappy1v5 linux-crashdump makedumpfile
0 upgraded, 11 newly installed, 0 to remove and 43 not upgraded.
Need to get 2,636 B/5,774 kB of archives.
After this operation, 26.0 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-crashdump amd64 4.15.0.46.48 [2,636 B]
Fetched 2,636 B in 0s (28.1 kB/s)    
Preconfiguring packages ...
Selecting previously unselected package binutils-common:amd64.
(Reading database ... 66831 files and directories currently installed.)
Preparing to unpack .../00-binutils-common_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking binutils-common:amd64 (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package libbinutils:amd64.
Preparing to unpack .../01-libbinutils_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking libbinutils:amd64 (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package binutils-x86-64-linux-gnu.
Preparing to unpack .../02-binutils-x86-64-linux-gnu_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking binutils-x86-64-linux-gnu (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package binutils.
Preparing to unpack .../03-binutils_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking binutils (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package libsnappy1v5:amd64.
Preparing to unpack .../04-libsnappy1v5_1.1.7-1_amd64.deb ...
Unpacking libsnappy1v5:amd64 (1.1.7-1) ...
Selecting previously unselected package crash.
Preparing to unpack .../05-crash_7.2.1-1ubuntu2_amd64.deb ...
Unpacking crash (7.2.1-1ubuntu2) ...
Selecting previously unselected package kexec-tools.
Preparing to unpack .../06-kexec-tools_1%3a2.0.16-1ubuntu1_amd64.deb ...
Unpacking kexec-tools (1:2.0.16-1ubuntu1) ...
Selecting previously unselected package libdw1:amd64.
Preparing to unpack .../07-libdw1_0.170-0.4_amd64.deb ...
Unpacking libdw1:amd64 (0.170-0.4) ...
Selecting previously unselected package makedumpfile.
Preparing to unpack .../08-makedumpfile_1%3a1.6.3-2_amd64.deb ...
Unpacking makedumpfile (1:1.6.3-2) ...
Selecting previously unselected package kdump-tools.
Preparing to unpack .../09-kdump-tools_1%3a1.6.3-2_amd64.deb ...
Unpacking kdump-tools (1:1.6.3-2) ...
Selecting previously unselected package linux-crashdump.
Preparing to unpack .../10-linux-crashdump_4.15.0.46.48_amd64.deb ...
Unpacking linux-crashdump (4.15.0.46.48) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up libdw1:amd64 (0.170-0.4) ...
Setting up kexec-tools (1:2.0.16-1ubuntu1) ...
Generating /etc/default/kexec...
Setting up binutils-common:amd64 (2.30-21ubuntu1~18.04) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up makedumpfile (1:1.6.3-2) ...
Setting up libsnappy1v5:amd64 (1.1.7-1) ...
Processing triggers for systemd (237-3ubuntu10.12) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up libbinutils:amd64 (2.30-21ubuntu1~18.04) ...
Setting up kdump-tools (1:1.6.3-2) ...

Creating config file /etc/default/kdump-tools with new version
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/50-curtin-settings.cfg'
Sourcing file `/etc/default/grub.d/kdump-tools.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-45-generic
Found initrd image: /boot/initrd.img-4.15.0-45-generic
done
Created symlink /etc/systemd/system/multi-user.target.wants/kdump-tools.service → /lib/systemd/system/kdump-tools.service.
Setting up linux-crashdump (4.15.0.46.48) ...
Setting up binutils-x86-64-linux-gnu (2.30-21ubuntu1~18.04) ...
Setting up binutils (2.30-21ubuntu1~18.04) ...
Setting up crash (7.2.1-1ubuntu2) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for systemd (237-3ubuntu10.12) ...
Within the installation you have to answer these questions:


After the installation the following parameter is added to the kernel cmdline:
grep -r crash /boot* |grep cfg
/boot/grub/grub.cfg:        linux    /boot/vmlinuz-4.15.0-46-generic root=UUID=a83c2a94-91c4-461a-b6a4-c7a81422a857 ro  maybe-ubiquity crashkernel=384M-:128M
/boot/grub/grub.cfg:            linux    /boot/vmlinuz-4.15.0-46-generic root=UUID=a83c2a94-91c4-461a-b6a4-c7a81422a857 ro  maybe-ubiquity crashkernel=384M-:128M
with
crashkernel=:[,:,...][@offset]
range=start-[end] 'start' is inclusive and 'end' is exclusive

The configuration is done via /etc/default/kdump-tools. Here the parameter to control the directory to dump the core into:

cat /etc/default/kdump-tools  |grep DIR
# KDUMP_COREDIR - local path to save the vmcore to.
KDUMP_COREDIR="/var/crash"
Next step is to reboot and verify the kernel cmdline.

#cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-4.15.0-46-generic root=UUID=a83c2a94-91c4-461a-b6a4-c7a81422a857 ro maybe-ubiquity crashkernel=384M-:128M


To get a coredump just use the following commands:
root@ubuntuserver:/etc# sysctl -w kernel.sysrq=1
kernel.sysrq = 1
root@ubuntuserver:/etc# echo c > /proc/sysrq-trigger

Contextual Targeting vs Behavioral Targeting

VitalSoftTech - Tue, 2019-06-18 12:19

Let’s suppose these are the olden times and you have to advertise for a new circus in town. Do you paste the posters on the walls of places of entertainment like a movie theater, a bar, horse racing tracks, or a casino? Or do you spend a little time about town and look around for […]

The post Contextual Targeting vs Behavioral Targeting appeared first on VitalSoftTech.

Categories: DBA Blogs

PostgreSQL partitioning (8): Sub-partitioning

Yann Neuhaus - Tue, 2019-06-18 04:19

We are slowly coming to the end of this little series about partitioning in PostgreSQL. In the last post we had a look at indexing and constraints and today we will have a look at sub partitioning. Sub partitioning means you go one step further and partition the partitions as well. Although it is not required to read all the posts of this series to follow this one: If you want, here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4) : Hash partitioning
  5. PostgreSQL partitioning (5): Partition pruning
  6. PostgreSQL partitioning (6): Attaching and detaching partitions
  7. PostgreSQL partitioning (7): Indexing and constraints

Coming back to our range partitioned table this is how it looks like currently:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_default DEFAULT

Lets assume that you expect that traffic violations will grow exponentially in 2022 because more and more cars will be on the road and when there will be more cars there will be more traffic violations. To be prepared for that you do not only want to partition by year but also by month. In other words: Add a new partition for 2022 but sub partition that by month. First of all you need a new partition for 2022 that itself is partitioned as well:

create table traffic_violations_p_2022
partition of traffic_violations_p
for values from ('2022-01-01') to ('2023-01-01') partition by range(date_of_stop);

Now we can add partitions to the just created partitioned partition:

create table traffic_violations_p_2022_jan
partition of traffic_violations_p_2022
for values from ('2022-01-01') to ('2022-02-01');

create table traffic_violations_p_2022_feb
partition of traffic_violations_p_2022
for values from ('2022-02-01') to ('2022-03-01');

create table traffic_violations_p_2022_mar
partition of traffic_violations_p_2022
for values from ('2022-03-01') to ('2022-04-01');

create table traffic_violations_p_2022_apr
partition of traffic_violations_p_2022
for values from ('2022-04-01') to ('2022-05-01');

create table traffic_violations_p_2022_may
partition of traffic_violations_p_2022
for values from ('2022-05-01') to ('2022-06-01');

create table traffic_violations_p_2022_jun
partition of traffic_violations_p_2022
for values from ('2022-06-01') to ('2022-07-01');

create table traffic_violations_p_2022_jul
partition of traffic_violations_p_2022
for values from ('2022-07-01') to ('2022-08-01');

create table traffic_violations_p_2022_aug
partition of traffic_violations_p_2022
for values from ('2022-08-01') to ('2022-09-01');

create table traffic_violations_p_2022_sep
partition of traffic_violations_p_2022
for values from ('2022-09-01') to ('2022-10-01');

create table traffic_violations_p_2022_oct
partition of traffic_violations_p_2022
for values from ('2022-10-01') to ('2022-11-01');

create table traffic_violations_p_2022_nov
partition of traffic_violations_p_2022
for values from ('2022-11-01') to ('2022-12-01');

create table traffic_violations_p_2022_dec
partition of traffic_violations_p_2022
for values from ('2022-12-01') to ('2023-01-01');

Looking at psql’s output when we describe the partitioned table not very much changed, just the keyword “PARTITIONED” is showing up beside our new partition for 2022:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'), PARTITIONED,
            traffic_violations_p_default DEFAULT

The is where the new functions in PostgreSQL 12 become very handy:

postgres=# select * from pg_partition_tree('traffic_violations_p');
             relid             |        parentrelid        | isleaf | level 
-------------------------------+---------------------------+--------+-------
 traffic_violations_p          |                           | f      |     0
 traffic_violations_p_default  | traffic_violations_p      | t      |     1
 traffic_violations_p_2013     | traffic_violations_p      | t      |     1
 traffic_violations_p_2014     | traffic_violations_p      | t      |     1
 traffic_violations_p_2015     | traffic_violations_p      | t      |     1
 traffic_violations_p_2016     | traffic_violations_p      | t      |     1
 traffic_violations_p_2017     | traffic_violations_p      | t      |     1
 traffic_violations_p_2018     | traffic_violations_p      | t      |     1
 traffic_violations_p_2019     | traffic_violations_p      | t      |     1
 traffic_violations_p_2020     | traffic_violations_p      | t      |     1
 traffic_violations_p_2021     | traffic_violations_p      | t      |     1
 traffic_violations_p_2022     | traffic_violations_p      | f      |     1
 traffic_violations_p_2022_jan | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_feb | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_mar | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_apr | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_may | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jun | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jul | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_aug | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_sep | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_oct | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_nov | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_dec | traffic_violations_p_2022 | t      |     2

To verify if data is routed correctly to the sub partitions let’s add some data for 2022:

insert into traffic_violations_p (date_of_stop)
       select * from generate_series ( date('01-01-2022')
                                     , date('12-31-2022')
                                     , interval '1 day' );

If we did the partitioning correctly we should see data in the new partitions:

postgres=# select count(*) from traffic_violations_p_2022_nov;
 count 
-------
    30
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_dec;
 count 
-------
    31
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_feb;
 count 
-------
    28
(1 row)

Here we go. Of course you could go even further and sub-partition the monthly partitions further by day or week. You can also partition by list and then sub-partition the list partitions by range. Or partition by range and then sub-partition by list, e.g.:

postgres=# create table traffic_violations_p_list_dummy partition of traffic_violations_p_list for values in ('dummy') partition by range (date_of_stop);
CREATE TABLE
postgres=# create table traffic_violations_p_list_dummy_2019 partition of traffic_violations_p_list_dummy for values from ('2022-01-01') to ('2023-01-01');
CREATE TABLE
postgres=# insert into traffic_violations_p_list (seqid, violation_type , date_of_stop) values (-1,'dummy',date('2022-12-01'));
INSERT 0 1
postgres=# select date_of_stop,violation_type from traffic_violations_p_list_dummy_2019;
 date_of_stop | violation_type 
--------------+----------------
 2022-12-01   | dummy
(1 row)

That’s it for sub-partitioning. In the final post we will look at some corner cases with partitioning in PostgreSQL.

Cet article PostgreSQL partitioning (8): Sub-partitioning est apparu en premier sur Blog dbi services.

Looking for errors in the Clusterware and RAC logs? Dash through using the TFA Collector

VitalSoftTech - Mon, 2019-06-17 09:49

The Oracle Trace File analyzer utility has been originally developed by Oracle to help collect and bundle up all the pertinent diagnostic data in the log files, tracefiles, os statistics, etc.. This is a very common task when Oracle Support engineers request this information to help troubleshoot issues and bugs.

The post Looking for errors in the Clusterware and RAC logs? Dash through using the TFA Collector appeared first on VitalSoftTech.

Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator