Site search

form.antibot { display: none !important; } You must have JavaScript enabled to use this form.

Aggregator

[latest] [categories]
Please use the contact form to include your blog here, suggest a feed, or give feedback.

If your blog is aggregated on OraFAQ, you may want to display this image on your blog:

Feed aggregator

Local RAG Explained with Unstructured and LangChain

Andrejus Baranovski - Mon, 2024-04-22 03:01

In this tutorial, I do a code walkthrough and demonstrate how to implement the RAG pipeline using Unstructured, LangChain, and Pydantic for processing invoice data and extracting structured JSON data.

Is it a must to run pupbld.sql as system

Tom Kyte - Mon, 2024-04-22 01:26

Tom, I create databases then I run the catalog.sql and catproc.sql. Sometimes, I donot run pupbld.sql. Users may get warning message but they could login and work. But, My friend says that if pupbld.sql is not run as system then users will get the error messages and they cannot log into the database at all. Is it true. Is it a must to run the pupbld.sql. I could not see in the documentation, whether it is a must. If, it is a must, how I am able to login. Is this being called by anyother script like catalog.sql, catproc.sql. I grepped both the files for pupbld.sql. It does not exist. Please clarify. Regards Ravi

Categories: DBA Blogs

Mixed version dataguard

Tom Kyte - Mon, 2024-04-22 01:26

According Metalink note 785347.1 it seems possible to have a dataguard with primary 11.2 and standby 12.2 or even later but it is really very condensed. Could you please just confirm that 11.2 -> 12.2 is really possible? If so, what about 11.2 -> 19.x ? Or 12.2 -> 19.x ? Of course the idea is to upgrade to a later version with a very short downtime, after having switched to the newer version the old one would be discarded and the dataguard no longer used. Best regards Mauro

Categories: DBA Blogs

How to call rest api which accept x-www-form-urlencoded in PL/SQL procedure in Apex

Tom Kyte - Mon, 2024-04-22 01:26

How to call rest api which accept x-www-form-urlencoded in PL/SQL procedure in Apex I am calling https://api.textlocal.in/docs/sendsms

Categories: DBA Blogs

Error in pl/sql code

Tom Kyte - Mon, 2024-04-22 01:26

When i try to run this code: DECLARE STUDENT_ID NUMBER; BEGIN -- Generate the next value for the sequence SELECT LMS_STUDENT_DETAILS_SEQ.nextval; -- Insert data into LMS_STUDENT_DETAILS table INSERT INTO LMS_STUDENT_DETAILS (STUDENT_ID, STUDENT_NAME, GENDER, DATE_OF_BIRTH, COURSE, CONTACT_NUMBER, DEPARTMENT) VALUES (STUDENT_ID, :P6_STUDENT_NAME, :P6_GENDER, :P6_DOB, :P6_COURSE, :P6_CONTACT_NO, :P6_DEPARTMENT); -- Insert data into LMS_BORROWER table INSERT INTO LMS_BORROWER (BORROWER_ID, ENTITY_OWNER_FK, ENTITY_TYPE) VALUES (LMS_BORROWER_SEQ.nextval, STUDENT_ID, 'STUDENT'); END; I faced this error: ORA-06550: line 1, column 106: PLS-00103: Encountered the symbol "DECLARE" when expecting one of the following: ( - + case mod new not null <an identifier> <a double-quoted delimited-identifier> <a bind variable> continue avg count current exists max min prior sql stddev sum variance execute forall merge time timestamp interval date <a string literal with character set specification> <a number> <a single-quoted SQL string> pipe <an alternatively-quoted string literal with character set

Categories: DBA Blogs

IF

Michael Dinh - Sat, 2024-04-20 08:14

Oracle Is Guilty Until Proven Innocent

Michael Dinh - Fri, 2024-04-19 22:26

Received email from Technical Lead | Senior Manager for the following errors.

Error Description: 0: Invalid pool name ‘oraclePool’ while getting a database connection.
Please check for consistency of the properties files or BPML
Time of Event: 20240419141429
Workflow Id: 88867

First inclination is to check Oracle database parameters (sessions and processes) which wasted time on a wild goose chase.

I am by no mean an expert but Google is your friend.

SI fails to startup with error: “Invalid pool name ‘NewPool_oraclePool’ while getting a database connection. Please check for consistency of the properties files or BPML” (SCI91968)

It puzzle me how a Technical Lead | Senior Manager does not know how to Google.

LMGTFY – Let Me Google That For You for all those people who find it more convenient to bother you with their question rather than to Google it for themselves.

How to update a user defined database package in production

Tom Kyte - Fri, 2024-04-19 18:26

I have a user defined database package which is used quite heavily. When I need to update the code body, I will get several <code>ORA-04061: existing state of package body "CS.PACKAGE" has been invalidated ORA-04065: not executed, altered or dropped package body "CS.PACKAGE" ORA-06508: PL/SQL: could not find program unit being called: "CS.PACKAGE" ORA-06512: at "CS.PROCEDURE", line 228</code> We are using a connection pool. How do I put the changes into PACKAGE, without getting several of the above errors? I cannot control the use of the package, and it is very heavily used.

Categories: DBA Blogs

How to Quantize a Model with Hugging Face Quanto

Pakistan's First Oracle Blog - Thu, 2024-04-18 03:55

This video is a hands-on step-by-step primer about how to quantize any model using Hugging Face Quanto which is a versatile pytorch quantization toolkit.

!pip install transformers==4.35.0

!pip install quanto==0.0.11

!pip install torch==2.1.1

!pip install sentencepiece==0.2.0

model_name = "google/flan-t5-small"

import sentencepiece as spm

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small")

model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-small")

input_text = "Meaning of happiness is "

input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)

print(tokenizer.decode(outputs[0]))

from helper import compute_module_sizes

module_sizes = compute_module_sizes(model)

print(f"The model size is {module_sizes[''] * 1e-9} GB")

from quanto import quantize, freeze

import torch

quantize(model, weights=torch.int8, activations=None)

freeze(model)

module_sizes = compute_module_sizes(model)

print(f"The model size is {module_sizes[''] * 1e-9} GB")

input_text = "Meaning of happiness is "

input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)

print(tokenizer.decode(outputs[0]))

Categories: DBA Blogs

How to do RAG in OpenAI GPT4 Locally with File Search

Pakistan's First Oracle Blog - Thu, 2024-04-18 01:50

This video is a hands-on step-by-step primer about how to use RAG with Open AI File Search. OpenAI now supports RAG which means that now you can attach your own files and custom data to OpenAI assistant and talk to your documents with GPT4.

Make sure you have installed latest version of openai on your local system:

pip install openai --upgrade

also make sure to have data.txt in the same folder as your script.

from openai import OpenAI

client = OpenAI()

assistant = client.beta.assistants.create(

name="Personal Assistant",

instructions="You are an empathetic. Use you knowledge base to answer questions.",

model="gpt-4-turbo",

tools=[{"type": "file_search"}],

)

# Create a vector store caled "Personal Data"

vector_store = client.beta.vector_stores.create(

name="Personal Data",

expires_after={

"anchor": "last_active_at",

"days": 1

}

)

# Ready the files for upload to OpenAI

file_paths = ["data.txt"]

file_streams = [open(path, "rb") for path in file_paths]

# Use the upload and poll SDK helper to upload the files, add them to the vector store,

# and poll the status of the file batch for completion.

file_batch = client.beta.vector_stores.file_batches.upload_and_poll(

vector_store_id=vector_store.id, files=file_streams

)

# You can print the status and the file counts of the batch to see the result of this operation.

print(file_batch.status)

print(file_batch.file_counts)

assistant = client.beta.assistants.update(

assistant_id=assistant.id,

tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},

)

thread = client.beta.threads.create(

tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},

messages=[

{

"role": "user",

"content": "Who is Fahd Mirza?",

}

]

)

# The thread now has a vector store with that file in its tool resources.

print(thread.tool_resources.file_search)

run = client.beta.threads.runs.create_and_poll(

thread_id=thread.id, assistant_id=assistant.id

)

messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text

annotations = message_content.annotations

citations = []

for index, annotation in enumerate(annotations):

message_content.value = message_content.value.replace(annotation.text, f"[{index}]")

if file_citation := getattr(annotation, "file_citation", None):

cited_file = client.files.retrieve(file_citation.file_id)

citations.append(f"[{index}] {cited_file.filename}")

print(message_content.value)

print("\n".join(citations))

Categories: DBA Blogs

How to Use LLM Function Calling Locally for Free

Pakistan's First Oracle Blog - Wed, 2024-04-17 23:38

Function calling in AI simply means that you can call external APIs from within your AI-powered application. Whenever you read that a model can do function calling, it means that it can take a natural language query of user and convert it to a function call. Then you can execute that function call to your API endpoint to get the data, and give it to LLM as additional context and get more grounded latest response as per your application requirement.

import openai

def Get_Beach_Patrol_Info(name, location):

if 'Bondi' in name:

print(f"The beach {name} at {location} is patrolled.")

elif 'Marley' in name:

print(f"The beach {name} of {location } is unpatrolled.")

else:

print(f"Beach not found.")

def get_gorilla_response(prompt="", model="gorilla-openfunctions-v2", functions=[]):

openai.api_key = "EMPTY" # Hosted for free with from UC Berkeley

openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"

try:

completion = openai.ChatCompletion.create(

model="gorilla-openfunctions-v2",

temperature=0.0,

messages=[{"role": "user", "content": prompt}],

functions=functions,

)

return completion.choices[0]

except:

print("error occurred.")

beach_custom_functions = [

{

'name': 'Get_Beach_Patrol_Info',

'description': 'Get name and location of beach from the body of the input text',

'parameters': {

'type': 'object',

'properties': {

'name': {

'type': 'string',

'description': 'Name of the beach'

'location': {

'type': 'string',

'description': 'Location where beach is located.'

}

]

beach_1_description="""Bondi Beach is an iconic destination located in Sydney, Australia,

renowned for its golden sands and lively atmosphere. It attracts surfers,

sunbathers, and tourists alike, offering stunning ocean views and a

vibrant coastal culture."""

beach_2_description="""Marley Beach (also known as Big Marley Beach) is a stunning beach in the

upper Royal National Park, on the outskirts of southern Sydney, Australia.

The beach is one of eleven beaches located within the territory of the

Royal National Park."""

beach_description=[beach_1_description,beach_2_description]

for i in beach_description:

response=get_gorilla_response(prompt=i, functions=beach_custom_functions)

func=response.message.content

print(func)

eval(func)

Categories: DBA Blogs

to_char a big number insert into database become scientific notation

Tom Kyte - Tue, 2024-04-16 17:06

Hi, Tom. Please see below script. <code>create table t0326 (id number, num varchar2(100)); declare v_empno number:=125854437665589038536841445202964995521300; begin dbms_output.put_line('v_empno -- ' || v_empno); dbms_output.put_line('to_char(v_empno) -- '|| to_char(v_empno)); insert into t0326 values(10, to_char(v_empno)); commit; end; / v_empno -- 125854437665589038536841445202964995521300 to_char(v_empno) -- 125854437665589038536841445202964995521300 select * from t0326; ID NUM ---------- ------------------------------------------------------------ 10 1.2585443766558903853684144520296500E+41 declare v_empno number:=125854437665589038536841445202964995521300; v_s_empno varchar2(100); begin v_s_empno := to_char(v_empno); dbms_output.put_line('v_empno -- ' || v_empno); dbms_output.put_line('to_char(v_empno) -- '|| to_char(v_empno)); dbms_output.put_line('v_s_empno -- '|| v_s_empno); insert into t0326 values(20, to_char(v_empno)); insert into t0326 values(30, v_s_empno); insert into t0326 values(40, to_char(v_empno, 'FM999999999999999999999999999999999999999999999999999999999')); commit; end; / v_empno -- 125854437665589038536841445202964995521300 to_char(v_empno) -- 125854437665589038536841445202964995521300 v_s_empno -- 125854437665589038536841445202964995521300 select * from t0326; ID NUM ---------- ----------------------------------------------------------------------- 10 1.2585443766558903853684144520296500E+41 20 1.2585443766558903853684144520296500E+41 30 125854437665589038536841445202964995521300 40 125854437665589038536841445202964995521300 </code> It display normal when "to_char(v_empno)" in dbms_output.put_line. But insert to database convert to scientific notation. I try two solutions to solve this problem. Please see below. 1. use a variable to store to_char(v_empno), then insert this varaible to database. 2. use to_char(xx, FMT) to control the format. I wonder why "to_char(v_empno)" in dbms_output.put_line is not scientific notation ? why add a temp variable could solve this problem ?

Categories: DBA Blogs

Explain plan estimate vs actual

Tom Kyte - Tue, 2024-04-16 17:06

Hi, I used explain plan and got the following results. Based on cost and time, does query 2 perform significantly better than query 1? The runtime for query 1 is approximately 1 minute and 40 seconds, but it shows 07:47:02. Why is the estimated time so different from the actual? Your help is much appreciated! Query 1: <code>------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 71730 | 2241K| 717M (1)| 07:47:02 | |* 1 | TABLE ACCESS FULL| TBL1 | 71730 | 2241K| 717M (1)| 07:47:02 | ------------------------------------------------------------------------------</code> Query 2: <code>------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 71730 | 2241K| 51028 (1)| 00:00:02 | |* 1 | TABLE ACCESS FULL| TBL1 | 71730 | 2241K| 51028 (1)| 00:00:02 | ------------------------------------------------------------------------------</code>

Categories: DBA Blogs

Elasticsearch, Ingest Pipeline and Machine Learning

Yann Neuhaus - Tue, 2024-04-16 05:23

Elasticsearch has few interesting features around Machine Learning. While I was looking for data to import into Elasticsearch, I found interesting data sets from Airbnb especially reviews. I noticed that it does not contain any rate, but only comments.

To have sentiment of the a review, I would rather have an opinion on that review like:

Negative
Positive
Neutral

For that matter, I found the cardiffnlp/twitter-roberta-base-sentiment-latest to suite my needs for my tests.

Import Model

Elasticsearch provides the tool to import models from Hugging face into Elasticsearch itself: eland.

It is possible to install it or even use the pre-built docker image:

docker run -it --rm --network host docker.elastic.co/eland/eland

Let’s import the model:

eland_import_hub_model -u elastic -p 'password!' --hub-model-id cardiffnlp/twitter-roberta-base-sentiment-latest --task-type classification --url https://127.0.0.1:9200

After a minute, import completes:

2024-04-16 08:12:46,825 INFO : Model successfully imported with id 'cardiffnlp__twitter-roberta-base-sentiment-latest'

I can also check that it was imported successfully with the following API call:

GET _ml/trained_models/cardiffnlp__twitter-roberta-base-sentiment-latest

And result (extract):

{
  "count": 1,
  "trained_model_configs": [
    {
      "model_id": "cardiffnlp__twitter-roberta-base-sentiment-latest",
      "model_type": "pytorch",
      "created_by": "api_user",
      "version": "12.0.0",
      "create_time": 1713255117150,
...
      "description": "Model cardiffnlp/twitter-roberta-base-sentiment-latest for task type 'text_classification'",
      "tags": [],
...
          },
          "classification_labels": [
            "negative",
            "neutral",
            "positive"
          ],
...
  ]
}

Next, model must be started:

POST _ml/trained_models/cardiffnlp__twitter-roberta-base-sentiment-latest/deployment/_start

This is subject to licensing. You might face this error “current license is non-compliant for [ml]“. For my tests, I used a trial.

Filebeat Configuration

I will use Filebeat to read review.csv file and ingest it into Elasticsearch. filebeat.yml looks like this:

filebeat.inputs:
- type: log
  paths:
    - 'C:\csv_inject\*.csv'

output.elasticsearch:
  hosts: ["https://localhost:9200"]
  protocol: "https"
  username: "elastic"
  password: "password!"
  ssl:
    ca_trusted_fingerprint: fakefp4076a4cf5c1111ac586bafa385exxxxfde0dfe3cd7771ed
  
  indices:
    - index: "csv"
  pipeline: csv

So each time a new file gets into csv_inject folder, Filebeat will parse it and send it to my Elasticsearch setup within csv index.

Pipeline

Ingest pipeline can perform basic transformation to incoming data before being indexed.

Data transformation

First step consists of converting message field, which contains one line of data, into several target fields (ie. split csv). Next, remove message field. This looks like this in Processors section of the Ingest pipeline:

Next, I also want to replace the content of the default timestamp field (ie. @timestamp) with the timestamp of the review (and remove the date field after that):

Inference

Now, I add the Inference step:

The only customization of that step is the field map as the default input field name is “text_field“, In the reviews, fields is named “comment“:

Optionally, but recommended, it is possible to add Failure processors which will set a field to keep track of the cause and will put them in a different index:

Ingest

Now, I can simply copy the review.csv into the watched directory and Filebeat will send lines to Elasticsearch. After few minutes, I can see the first results:

Or, a considered negative example with the associated prediction rate:

What Next?

Of course, we could try another model to compare results.

If you did not noticed, this was also a first step into Extract-transform-load topic (ETL).

L’article Elasticsearch, Ingest Pipeline and Machine Learning est apparu en premier sur dbi Blog.

Learning Azure by having fun with ChatGPT

Yann Neuhaus - Tue, 2024-04-16 02:27

Are you studying for an Azure certification? You may feel at some point you need a break from the dry and factual technical information. I’m in your shoes right now and I have done some experiments with ChatGPT. I’m not using it to write my text but to learn by having fun! It is very good at telling stories so why not use that for what interests me right now? Azure architecture!

In this blog post I’m going to show you how I use it, it may give you some ideas to make your learning a little more spicy!

A visual representation of Azure Cloud and on-premises infrastructure.

Prompt for preparing ChatGPT with Azure

First you’ll need to prepare ChatGPT by explaining what you want. Set the framework and limit its responses otherwise it is too chatty. As I like science fiction (sci-fi), I’m interested in a futuristic story but you could choose any style you like. My first prompt and ChatGPT response is shown below:

Act as an Azure architect. You are in charge of educating me, your student, to become an Azure architect too. You will respond to my questions by using story telling to explain each concepts requested.
I want you to:
– Always tell a simple and futuristic story
– Limit the reply to 100 words

As an Azure architect, imagine Azure as a vast cosmic city. Each service is a towering skyscraper, and we, architects, design the blueprints. Let’s start with Virtual Networks. They’re like the city’s infrastructure, connecting buildings. Subnets divide areas within each building, like floors. Next, VMs are the inhabitants, residing in these skyscrapers, each serving a unique purpose. Azure Storage, a network of secure vaults, safeguards their belongings. With Azure Load Balancer as traffic controllers, the city thrives, ensuring smooth navigation. As architects of this digital metropolis, we sculpt a future where innovation soars amidst the azure skyline.

Not bad! It gives me some key components of an Azure infrastructure and makes some analogies I can easily picture in my head. With this basis, I can now be the hero of this story by asking more questions.

Build up that Azure cosmic city with ChatGPT

You can then ask questions around your area of study (mine is Azure architecture) and stay immersed into that story. For example:

What are the best practice to govern this Azure cosmic city?

In our cosmic city, governance is paramount. Azure Policy acts as the city’s constitution, ensuring compliance with regulations. Role-Based Access Control (RBAC) assigns roles to citizens, granting appropriate permissions. Azure Resource Manager templates are like city blueprints, ensuring consistency in construction. Tags label buildings for easy identification, while Azure Monitor keeps a watchful eye on city activities. With these best practices, our cosmic city thrives, ensuring order, security, and harmony amidst the azure skyline.

It gives you the key elements of Azure governance by continuing its story. You can go on like this back and forth by throwing in any chapter of your course. You can focus on a specific use case as well and see how it integrates with this story:

How can I migrate my SQL Server database on-premises to this city?

Imagine your SQL Server database as a cozy cottage in the countryside. To migrate to our cosmic city, Azure, we’ll use Azure Database Migration Service, like a futuristic moving company. It carefully transports your data to Azure SQL Database, a sleek skyscraper in our city. With minimal downtime, your database seamlessly transitions to its new home, ready to thrive amidst the azure skyline. Just sit back, relax, and watch as your data becomes part of the bustling digital metropolis.

How do I ensure my database is highly resilient in this cosmic city?

In our cosmic city, resilience is key. Azure offers a robust solution: Azure SQL Database with built-in redundancy, akin to a fortress with multiple layers of defense. Automatic backups act as a safety net, ensuring data is never lost. Geo-replication creates duplicates across regions, safeguarding against disasters. With Azure SQL Database’s high availability and disaster recovery features, your database thrives amidst the azure skyline, resilient against any digital storm.

I don’t know if it works for you but I find it fascinating I can make ChatGPT talk endlessly and respond to any questions I have about Azure in a sci-fi story!

Dive into an Azure component

As usual with ChatGPT you need to drive the discussion to get the most of it. Let’s see what it has to say about Azure AD (I use the legacy terminology as ChatGPT v3.5 doesn’t know yet Microsoft Entra ID):

What and where would be the Azure AD in this city?

In our bustling city, Azure AD is the central hub, akin to City Hall. It’s where identities are registered, like citizens in our digital society. Located at the heart of the city, Azure AD ensures secure access to every digital service and resource, much like City Hall grants access to essential services. From authentication to authorization, Azure AD governs the city’s digital identities, ensuring a seamless and secure experience for all inhabitants amidst the azure skyline.

How would Azure AD conditional access be implemented in this city?

In our city, Azure AD Conditional Access acts as the city’s security checkpoints. Picture futuristic gates at key entry points, only granting access to authorized individuals based on specific conditions. Want to enter the financial district skyscraper? You’ll need more than just a pass; perhaps a biometric scan during business hours. Conditional Access ensures only the right people, with the right permissions, access sensitive areas of our digital city. It’s our shield against cyber threats amidst the azure skyline, ensuring a safe and secure digital environment for all inhabitants.

How would I connect from my Active Directory in the countryside to this Azure AD in the cloud?

In the countryside, your Active Directory (AD) is like a quaint village square, bustling with local identities. To connect to Azure AD in our cosmic cloud city, envision a digital bridge spanning the distance between village and skyline. Azure AD Connect is your trusty guide, seamlessly synchronizing identities from countryside AD to Azure AD. With this bridge in place, villagers can access digital services in the cloud using their familiar countryside credentials. Just like that, the countryside becomes an integral part of our digital metropolis, united under the azure skyline.

You can then piece by piece for each component create a complete city in the countryside (OnPrem) and in the cloud and illustrate their interaction. We could completely gamify this! SimCity anyone?

Limitations

Some general concepts are more difficult to put into a visual story as for example the Microsoft Cloud Adoption Framework:

What would be the Microsoft Cloud Adoption Framework for Azure in this city?

In our dynamic metropolis, the Microsoft Cloud Adoption Framework for Azure is our roadmap to success. Like a futuristic GPS, it guides us through the twists and turns of digital transformation. Envisioning helps us set our destination, planning ensures we choose the best route, and readiness prepares us for the journey ahead. Migration helps us seamlessly transition, while innovation propels us forward. With governance as our compass and optimization as our fuel, we navigate the ever-changing landscape, building a future where our digital city thrives amidst the azure skyline. Ready to embark on this transformative journey?

I don’t find it fantastic as we recognize the empty blabbering of ChatGPT. We would need to be more specific to get something more interesting.

Wrap up

You can see how by relaxing from your studies, you can still continue to learn by having fun in an imaginary world. You could totally convert all this into visual notes that will help you when you renew your certifications. That is something I’m starting to explore.

This is just a glimpse of how you could use ChatGPT in your journey to learn Azure or anything else. Brainstorm any concept, service or component you are learning and see how it integrates into a visual story to get a high-level picture. Let me know if your are using ChatGPT that way for learning and what is the world you are building for it!

L’article Learning Azure by having fun with ChatGPT est apparu en premier sur dbi Blog.

Embedding Machine Learning Models in the Oracle Database: Create an ONNX model

DBASolved - Mon, 2024-04-15 09:20

This post is the first of a three-part series where I’m going to show you how to use pre-configured […]

The post Embedding Machine Learning Models in the Oracle Database: Create an ONNX model appeared first on DBASolved.

Categories: DBA Blogs

Local LLM RAG with Unstructured and LangChain [Structured JSON]

Andrejus Baranovski - Mon, 2024-04-15 07:22

Using unstructured library to pre-process PDF document content, to be in a cleaner format. This helps LLM to produce more accurate response. JSON response is generated thanks to Nous Hermes 2 PRO LLM. Without any additional post-processing. Using Pydantic dynamic class to validate response to make sure it matches request.

Freddie Starr Ate My File ! Finding out exactly what the Oracle Filewatcher is up to

The Anti-Kyte - Mon, 2024-04-15 01:30

As useful as they undoubtedly are, any use of a DBMS_SCHEDULER File Watchers in Oracle is likely to involve a number of moving parts.
This can make trying to track down issues feel a bit like being on a hamster wheel.
Fortunately, you can easily find out just exactly what the filewatcher is up to, if you know where to look …

I’ve got a procedure to populate a table with details of any arriving file.

create table incoming_files(
    destination VARCHAR2(4000),
    directory_path VARCHAR2(4000),
    actual_file_name VARCHAR2(4000),
    file_size NUMBER,
    file_timestamp TIMESTAMP WITH TIME ZONE)
/

create or replace procedure save_incoming_file( i_result sys.scheduler_filewatcher_result)
as
begin
    insert into incoming_files( 
        destination, 
        directory_path, 
        actual_file_name, 
        file_size, 
        file_timestamp)
    values(
        i_result.destination,
        i_result.directory_path,
        i_result.actual_file_name,
        i_result.file_size,
        i_result.file_timestamp);
end;
/

The filewatcher and associated objects that will invoke this procedure are :

begin
    dbms_credential.create_credential
    (
        credential_name => 'starr',
        username => 'fstarr',
        password => 'some-complex-password'
    );
end;
/

begin
    dbms_scheduler.create_file_watcher(
        file_watcher_name => 'freddie',
        directory_path => '/u01/app/upload_files',
        file_name => '*.txt',
        credential_name => 'starr',
        enabled => false,
        comments => 'Feeling peckish');
end;
/



begin
    dbms_scheduler.create_program(
        program_name => 'snack_prog',
        program_type => 'stored_procedure',
        program_action => 'save_incoming_file',
        number_of_arguments => 1,
        enabled => false);
         
    -- need to make sure this program can see the message sent by the filewatcher...
    dbms_scheduler.define_metadata_argument(
        program_name => 'snack_prog',
        metadata_attribute => 'event_message',
        argument_position => 1);
         
    -- Create a job that links the filewatcher to the program...
    dbms_scheduler.create_job(
        job_name => 'snack_job',
        program_name => 'snack_prog',
        event_condition => null,
        queue_spec => 'freddie',
        auto_drop => false,
        enabled => false);
end;
/

The relevant components have been enabled :

begin
    dbms_scheduler.enable('freddie');
    dbms_scheduler.enable('snack_prog');
    dbms_scheduler.enable('snack_job');
end;
/

… and – connected on the os as fstarr – I’ve dropped a file into the directory…

echo 'Squeak!' >/u01/app/upload_files/hamster.txt

Watching the (File)Watcher

File watchers are initiated by a scheduled run of the SYS FILE_WATCHER job.

The logging_level value determines whether or not the executions of this job will be available in the *_SCHEDULER_JOB_RUN_DETAILS views.

select program_name, schedule_name, 
    job_class, logging_level
from dba_scheduler_jobs
where owner = 'SYS'
and job_name = 'FILE_WATCHER'
/

PROGRAM_NAME         SCHEDULE_NAME             JOB_CLASS                           LOGGING_LEVEL  
-------------------- ------------------------- ----------------------------------- ---------------
FILE_WATCHER_PROGRAM FILE_WATCHER_SCHEDULE     SCHED$_LOG_ON_ERRORS_CLASS          FULL

If the logging_level is set to OFF (which appears to be the default in 19c), you can enable it by connecting as SYSDBA and running :

begin
    dbms_scheduler.set_attribute('FILE_WATCHER', 'logging_level', dbms_scheduler.logging_full);
end;
/

The job is assigned the FILE_WATCHER_SCHEDULE, which runs every 10 minutes by default. To check the current settings :

select repeat_interval
from dba_scheduler_schedules
where schedule_name = 'FILE_WATCHER_SCHEDULE'
/

REPEAT_INTERVAL               
------------------------------
FREQ=MINUTELY;INTERVAL=10

The thing is, there are times when the SYS.FILE_WATCHER seems to slope off for a tea-break. So, if you’re wondering why your file has not been processed yet, it’s handy to be able to check if this job has run when you expected it to.

In this case, as logging is enabled, we can do just that :

select log_id, log_date, instance_id, req_start_date, actual_start_date
from dba_scheduler_job_run_details
where owner = 'SYS'
and job_name = 'FILE_WATCHER'
and log_date >= sysdate - (1/24)
order by log_date desc
/

LOG_ID  LOG_DATE                            INSTANCE_ID REQ_START_DATE                             ACTUAL_START_DATE                         
------- ----------------------------------- ----------- ------------------------------------------ ------------------------------------------
1282    13-APR-24 14.50.47.326358000 +01:00           1 13-APR-24 14.50.47.000000000 EUROPE/LONDON 13-APR-24 14.50.47.091753000 EUROPE/LONDON
1274    13-APR-24 14.40.47.512172000 +01:00           1 13-APR-24 14.40.47.000000000 EUROPE/LONDON 13-APR-24 14.40.47.075846000 EUROPE/LONDON
1260    13-APR-24 14.30.47.301176000 +01:00           1 13-APR-24 14.30.47.000000000 EUROPE/LONDON 13-APR-24 14.30.47.048977000 EUROPE/LONDON
1248    13-APR-24 14.20.47.941210000 +01:00           1 13-APR-24 14.20.47.000000000 EUROPE/LONDON 13-APR-24 14.20.47.127769000 EUROPE/LONDON
1212    13-APR-24 14.10.48.480193000 +01:00           1 13-APR-24 14.10.47.000000000 EUROPE/LONDON 13-APR-24 14.10.47.153032000 EUROPE/LONDON
1172    13-APR-24 14.00.50.676270000 +01:00           1 13-APR-24 14.00.47.000000000 EUROPE/LONDON 13-APR-24 14.00.47.111936000 EUROPE/LONDON

6 rows selected.

Even if the SYS.FILE_WATCHER is not logging, when it does run, any files being watched for are added to a queue, the contents of which can be found in SCHEDULER_FILEWATCHER_QT.
This query will get you the really useful details of what your filewatcher has been up to :

select 
    t.step_no,
    treat( t.user_data as sys.scheduler_filewatcher_result).actual_file_name as filename,
    treat( t.user_data as sys.scheduler_filewatcher_result).file_size as file_size,
    treat( t.user_data as sys.scheduler_filewatcher_result).file_timestamp as file_ts,
    t.enq_time,
    x.name as filewatcher,
    x.requested_file_name as search_pattern,
    x.credential_name as credential_name
from sys.scheduler_filewatcher_qt t,
    table(t.user_data.matching_requests) x
where enq_time > trunc(sysdate)
order by enq_time
/

  STEP_NO FILENAME         FILE_SIZE FILE_TS                          ENQ_TIME                     FILEWATCHER     SEARCH_PATTERN  CREDENTIAL_NAME
---------- --------------- ---------- -------------------------------- ---------------------------- --------------- --------------- ---------------
         0 hamster.txt              8 13-APR-24 12.06.58.000000000 GMT 13-APR-24 12.21.31.746338000 FREDDIE         *.txt           STARR

Happily, in this case, our furry friend has avoided the Grim Squaker…

NOTE – No hamsters were harmed in the writing of this post.

Oracle OEM Read Only Access

Michael Dinh - Sun, 2024-04-14 20:47

With great power comes great responsibility.

Reference: https://en.wikipedia.org/wiki/With_great_power_comes_great_responsibility

On boarding 4 Database Consultants and they have request access to OEM. Sharing SYSMAN password with every DBA is not a good idea and also difficult to determined who messed up.

Here are 2 articles and I favor Doc ID 2180307.1 based on last update and contains screenshots.

OEM 13c How To Create an Administrator with Read Only Access (Doc ID 2925232.1)
Enterprise Manager Base Platform – Version 13.4.0.0.0 and later
Last Update: Feb 1, 2023

EM 13c : How to Create an EM Administrator with Read Only Access to the Performance Pages of a Database Target? (Doc ID 2180307.1)
Enterprise Manager for Oracle Database – Version 13.1.1.0.0 and later
Last Update: May 9, 2023

Would have been nice to have emcli script to do this but beggars cannot be choosers.

Monitor Elasticsearch Cluster with Zabbix

Yann Neuhaus - Fri, 2024-04-12 12:37

Setting up Zabbix monitoring over an Elasticsearch cluster is quiet easy as it does not require an agent install. As a matter a fact, the official template uses the Elastic REST API. Zabbix server itself will trigger these requests.

In this blog post, I will quick explain how to setup Elasticsearch cluster, then how easy the Zabbix setup is and list possible issues you might encounter.

Elastic Cluster Setup

I will not go too much in detail as David covered already many topics around ELK. Anyway, would you need any help to install, tune or monitor your ELK cluster fell free to contact us.

My 3 virtual machines are provisioned with YaK on OCI. Then, I install the rpm on all 3 nodes.

After starting first node service, I am generating an enrollment token with this command:

/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -node

This return a long string which I will need to pass on node 2 and 3 of the cluster (before starting anything):

/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node --enrollment-token <...>

Output will look like that:

This node will be reconfigured to join an existing cluster, using the enrollment token that you provided.
This operation will overwrite the existing configuration. Specifically:
  - Security auto configuration will be removed from elasticsearch.yml
  - The [certs] config directory will be removed
  - Security auto configuration related secure settings will be removed from the elasticsearch.keystore
Do you want to continue with the reconfiguration process [y/N]

After confirming with a y, we are almost ready to start. First, we must update ES configuration file (ie. /etc/elasticsearch/elasticsearch.yml).

Add IP of first node (only for first boot strapped) in cluster.initial_master_nodes: ["10.0.0.x"]
Set listening IP of the inter-node trafic (to do on node 1 as well): transport.host: 0.0.0.0
Set list of master eligible nodes: discovery.seed_hosts: ["10.0.0.x:9300"]

Now, we are ready to start node 2 and 3.

Let’s check the health of our cluster:

curl -s https://localhost:9200/_cluster/health -k -u elastic:password | jq

If you forgot elastic password, you can reset it with this command:

/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

Zabbix Configuration

With latest Elasticsearch release, security has drastically increased as SSL communication became the standard. Nevertheless, default MACROS values of the template did not. Thus, we have to customize the followings:

{$ELASTICSEARCH.USERNAME} to elastic
{$ELASTICSEARCH.PASSWORD} to its password
{$ELASTICSEARCH.SCHEME} to https

If SELinux is enabled on your Zabbix server, you will need to allow zabbix_server process to send network request. Following command achieves this:

setsebool zabbix_can_network 1

Next, we can create a host in Zabbix UI like that:

The Agent interface is required but will not be used to reach any agent as there are not agent based (passive or active) checks in the linked template. However, http checks uses HOST.CONN MACRO in the URLs. Ideally, the IP should be a virtual IP or a load balanced IP.

Don’t forget to set the MACROS:

After few minutes, and once nodes discovery ran, you should see something like that:

Problems

What will happen if one node stops? On Problems tab of Zabbix UI:

After few seconds, I noticed that ES: Health is YELLOW gets resolved on its own. Why? Because shards are re-balanced across running servers.

I confirm this by graphing Number of unassigned shards:

We can also see the re-balancing with the network traffic monitoring:

Received bytes on the left. Sent on the right.

Around 15:24, I stopped node 3 and shards were redistributed from node 1 and 2.

When node 3 start, at 15:56, we can see node 1 and 2 (20 Mbps each) send back shards to node 3 (40 Mbps received).

Conclusion

Whatever the monitoring tool you are using, it always help to understand what is happening behind the scene.

L’article Monitor Elasticsearch Cluster with Zabbix est apparu en premier sur dbi Blog.

Site search

User login

Site navigation

Aggregator

You are here

Feed aggregator

Pages