Skip to content

need scalar queries to work - stuck - help please #160

Open
@nickknyc

Description

@nickknyc

@pamelafox
I have been trying for a few days to adapt the system to work with my data - a single table of items - let's say they are cars

I have modified following the guidance in customizing_data.md and the other data branch. I have looked through other branches, am just not having any luck - example queries, few shot, answer.txt, query.txt and searcher examples below.

I am really not sure where the structured logic of things like queries and creating a ton of functions to handle the possible query types - and how much to rely on the prompts to do things. For instance, when I ask for a list of something I tend to get 3 full car descriptions. I realize the 3 is probably being set in the developer settings.

Guidance here would be greatly appreciated! I realize this project is 5 months old and as an ex-microsoftee, I know you are moving ahead at full speed, but I have found this template extremely helpful in accelerating my client's embracing AI and Azure for all AI related workloads. (I fully understand this is not intended for production use, but it has many best practices built into the solution.

Specifically, I would like to know:

what is the extent of the power of the prompts, what aspects of system behavior should I lean on them for (see below)
how much should I be modifying QueryRewriter and PostgresSearcher, how generic should those changes be (see below)
exactly what components in the FrontEnd need to be modified to handle different response types - say lists, citations under each response in a field that toggles visibility like in a file browser.
Examples of what I have been trying - all at the same time - app runs and deploys file rarely crashes on a query.

trying to do is get support for queries like:

Q: how many 1989 Ford's do we have? I want the answer to be something like

A: we have 12 Ford cars in inventory

  • follow up question would you like to see a list of all of the?

Q: Which is the range of reliability ratings?
A: The cars in inventory range from 3 to 4.5 safety ratings

  • follow up would you like to see a list of them sorted by rating?

Q: of the different manufacturers, which is the most popular?
A: the Subaru Outback is the most popular, using the formula of the sum of the count of the number of votes for each star rating (1-5) /the number of votes
Q: what adjective word occurs the most in the comments section
A: happy was mentioned in 20 out of 120 reviews

I have tried modified the following:

QueryRewriter Updates - a bunch like this

                "select": {
                    "type": "object",
                    "description": "Specify scalar operations like COUNT, AVG, etc.",
                    "properties": {
                        "operation": {
                            "type": "string",
                            "enum": ["COUNT", "AVG", "SUM", "MIN", "MAX"],
                            "description": "The scalar operation to perform",
                        },
                        "field": {
                            "type": "string",
                            "description": "The field to operate on, e.g. '*' for COUNT(*) or 'metrics->>'Views'' for views",
                        },
                    },
                    "required": ["operation", "field"],
                },

PostgresSearcher Updates:

Implement scalar operations for numeric metrics
Add type coercion for mixed string/number metrics
Example query: metrics->>'Views'::numeric for consistent numeric handling
QueryRewriter Updates

Add metric name patterns to query generation
Handle numeric comparisons in natural language
Update few-shot with entries like
{
"role": "user",
"content": "how many Ford cars do we have?"
},
{
"role": "assistant",
"tool_calls": [{
"function": {
"name": "search_database",
"arguments": {
"return_scalar": {
"type": "count",
"expression": "*"
},
"filters": {
"Brand": "ig",
"category": "cars"
}
}
}
}]
}
Answers.txt

Assistant helps users find relevant cars and trucks.
Respond as if you are a knowledgeable content curator. Do NOT respond with tables.
Answer ONLY with the content details listed in the sources.
If there isn't enough information below, say you don't know.
Do not generate answers that don't use the sources below.

For scalar query results i.e. how many, highest rating, (like COUNT, AVG, etc):

  • Look for the metrics.scalar_result field which contains the actual numeric value
  • Present the value clearly and directly in a complete sentence
  • Include the [scalar_result] citation
    query.txt

You are a query generator that helps translate natural language questions into search queries or scalar operations.

For scalar operations (like counting, averaging, finding max/min), generate a scalar operation with:

select_expr: The expression to compute (e.g. MAX, COUNT, AVG)
group_by: Optional list of columns to group by
order_by: Optional list of columns to order by
filters: List of filters to apply
Examples:

"How many Ford cars are there?" -> COUNT(*) with brand = 'ford' and category='car'
"What vehicle has the most likes?" -> MAX((metrics->>'Likes')::numeric)
"Average price of Subarus?" -> AVG((metrics->>'price')::numeric) with brand = 'subaru'
For regular search queries, generate a search query based on the conversation.
If the question is not in English, translate it to English before generating the query.

If you cannot generate a search query, return the original user question.

Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching database rows.
You have access to an Azure PostgreSQL database with a posts table that has the following columns:

Text fields: model, brand, category, description
Arrays: reviews, stars, features
JSON fields: ownership_records (containing key value pairs of owner_name, puchase_date, Purchase_location, sale_date, sale_location )
For questions asking about counts, averages, sums, or other aggregate metrics:

Use the "select" operation with COUNT, AVG, SUM, etc.
Example: "how many Subaru wagons? are in stock" -> use COUNT(*) with brand="subaru" and category= "wagon"
Example: "average rating of fords?" -> use AVG(metrics->>'rating') with brand="ford"
You can combine with filters like channel, date ranges, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions