Subscribe to Blog via Email
Follow me on TwitterMy Tweets
The network has often been viewed as “no man’s land” for the DBA- Our tools may identify network latency, but rarely does it go into any details, designating the network outside our jurisdiction.
As we work through data gravity, i.e. the weight of data, the pull of applications, services, etc. to data sources, we have to inspect what connects it to the data and slows it down. Yes, the network.
We can’t begin to investigate the network without spending some time on Shannon’s law, also known as Shannon-Hartley Theorem. The equation relates to the maximum capacity (transmission bit rate) that can be achieved over a given channel with certain noise characteristics and bandwidth.
This theorem has been around for quite some time in the telephony world, first patented in 1903 by W.M Minor with the introduction of a concept on how to increase the capacity of transmission lines. Over the years, multiplexing and quantizers were introduced, but the main computation has stayed the same:
In layman’s terms- the data is only going to go as fast as it can do so without hitting a error threshold.
As a DBA, we always inspect waits in the form of latency and latency is actually just a measure of time and you should always tune for time or you’re just wasting time. Latency is the closest measure to speed when you compare it to the distance involved between two points, which when discussing data source and application, etc., are your points. This is where it gets interesting. Super low latency networks aren’t necessarily huge bandwidth, such as infiniband, which is common in engineered systems like Exadata. In comparison, standard networks can have much higher volume, but they can’t talk as “fast” on a packet by packet basis. These types of networks compete by providing extensive parallel lanes, but as we know, individual lanes simply won’t be able to.
Now I’m not going to go into the further areas of this theorem, including the Shannon’s Limit, but the network, especially with the introduction of the cloud, has reared its ugly head as the newest bottle neck. There’s a very good reason cloud providers like AWS have come up with Snowmobile. Every cloud project I’ve been on, the network has been a significant impact to its success. My advice to all DBAs is to enhance your knowledge with information about networking. If you didn’t respect your network administrator before, you will after you do a little research… 🙂 It will serve you as you embrace the cloud.
Data gravity and the friction it causes within the development cycle is an incredibly obvious problem in my eyes.
Data gravity suffers from the Von Newmann Bottleneck. It’s a basic limitation on how fast computers can be. Pretty simple, but states that the speed of where data resides and where it’s processed is the limiting factor in computing speed.
OLAP, DSS and VLDB DBAs are constantly in battle with this challenge. How much data is being consumed in a process, how much must be brought from disk and will the processing required to create the results end up “spilling” to disk vs. completing in memory.
Microsoft researcher Jim Gray has spend most of his career looking at the economics of data, which is one of the most accurate terms of this area of technical study. He started working at Microsoft in 1995 and although passionate about many areas of technology, his research on large databases and transactional processing speeds is one of great respect in my world.
Now some may say this has little to do with being a database administrator, but how many of us spend significant time on the cost based optimizer, as moving or getting data has cost- so economics of data it is.
And this is the fundamental principle of data gravity and why DBAs get the big bucks.
If you’re interested in learning more about data gravity, DevOps and the future of DBAs, register for the upcoming webinar.
I’m off to Columbus, Ohio tomorrow for a full day of sessions on Friday for the Ohio Oracle User Group. The wonderful Mary E. Brown and her group has set up a great venue and a fantastic schedule. Next week, I’m off to SQL Saturday Vancouver to present on DevOps for the DBA to a lovely group of SQL Server attendees. It’s my first time to Vancouver, British Columbia and as it’s one of the cities on our list of potential future locations to live, I’m very excited to visit.
Speaking of SQL Server- Delphix‘s own SQL Server COE, (Center of Excellence) meets twice a month to discuss various topics surrounding our much-loved Microsoft offering. This week, one of the topics discussed a previous change made to permissions to the Backup Operator role from SQL Server 2008R2 to SQL Server 2012. This feature, referred to as “File Share Scoping” was unique to 2008R2 clusters and no longer exists.
Now many may say, “but this is such an old version. We’ve got SQL Server 2017, right?” The challenge is, there are folks out there with 2008 instances and it’s good to know about these little changes that can make big impacts to your dependent products. This change impacted products with shared backups file systems and as we know, having access to a backup can offload a lot of potential load on a system.
Now, for my product, Delphix, we are dependent on read access to backup files for the initial creation of our “golden copy” that we source everything from. The change in SQL Server 2012 from the previous File Share Scoping in 2008R2 was only made to Microsoft Failover Clusters, to then offering access to only those with Administrator, where previously, anyone with Backup Operator role could attain access, too.
Our documentation clearly states during configuration of a Delphix engine for the validated sync, (creation of the golden copy) the customer must grant read access for the backup shares to the Delphix OS user and doesn’t state to grant Backup Operator. As with everything, routine can spell failure, as the Backup Operator role previously offered this access with 2008R2 and it was easy to assume the configuration complete upon database level role grants.
Using Powershell from the command line, note that you can’t view the root of the shared drive with the file server role, Backup Operator in the newer release.
PS C:\Users\user> Get-SmbShareAccess -name "E$" | ft -AutoSize Name ScopeName AccountName AccessControlType AccessRight ---- --------- ----------- ----------------- ----------- E$ USER1-SHARE BUILTIN\Administrators Allow Full E$ * BUILTIN\Administrators Allow Full E$ * BUILTIN\Backup Operators Allow Full E$ * NT AUTHORITY\INTERACTIVE Allow Full
If you’d like to read more details on backup and recovery changes from SQL Server 2008R2 to 2012, check out the documentation from Microsoft here.
This is an extensive series of blog posts, (four so far) to be followed by an ebook, a podcast and two webinars. One is to be announced soon from Oracle called, “The DBA Diaries” and the other will be a from Delphix, titled, “The Revolution: From Databases and DevOps to DataOps“.
The goal for all of this is to ease transition for the Database community as the brutal shift to the cloud, now underway, changes our day to day lives. Development continues to move at an ever accelerating pace and yet the DBA is standing still, waiting for the data to catch up with it all. This is a concept that many refer to as “data gravity“.
The concept was first coined just a few years ago by a Senior VP Platform Engineer, Dave McCrory. It was an open discussion aimed at understanding how data impacted the way technology changed when connected with network, software and compute.
He discusses the basic understanding that there’s a limit in “the speed with which information can get from memory (where data is stored) to computing (where data is acted upon) is the limiting factor in computing speed.” called the Von Newmann Bottleneck.
These are essential concepts that I believe all DBAs and Developers should understand, as data gravity impacts all of us. Its the reason for many enhancements to database, network and compute power. Its the reason optimization specialists are in such demand. Other roles such as backup, monitoring and error handling can be automated, but the more that we drive logic into programs, nothing is as good as true skill in optimization when it comes to eliminating much of data gravity issues. Less data, less weight- it’s as simple as that.
We all know the cloud discussions are coming, and with that, even bigger challenges are felt by the gravity from data. Until then, let’s just take a step back and recognize that we need some new goals and some new skills. If you’re like to learn more about data gravity, but don’t have time to take it all in at once, consider following it on Twitter, which is curated by Dave McCrory.
I’m off to Jacksonville, Fl. tomorrow to speak at SQL Saturday #649!
I’ve been asked what it takes to be a successful evangelist and realizing that what makes one successful at it, is often like holding sand in your hands- no matter how tightly you hold your fists, it’s difficult to contain the grains.
The term evangelist is one that either receives very positive or very negative responses. I’m not a fan of the term, but no matter if you use this term or call them advocates, representative, influencer- it doesn’t matter, they are essential to the business, product or technology that they become the voice for.
Those that I view as successful evangelists in the communities that I am part of?
There are a number of folks I’m sure I missed I also admire as I interact and observe their contributions, but these are a few that come to mind when I think of fellow evangelists.
1. It’s Not Just About the Company
Most companies think they hire an evangelist to promote and market the company and yet, when all you do it push out company info, company marketing- People STOP listening to you. What you say, do and are interested in should drive people to want to know more about you, including the company you work for and what that company does.
All of these folks talk about interests outside of work. They post about their lives, their interests and contribute to their communities. This is what it means to be really authentic and setting an example. People want to be more like them because they see the value they add to the world than just talking points.
2. They’re Authentic
Authenticity is something most find very elusive. If you’re just copying what another does, there’s nothing authentic about that. There’s nothing wrong finding a tip or tidbit that someone else is doing and adopting it, but it has to WORK for you. I was just part of a conversation yesterday, where Jeff and I were discussing that he doesn’t use Buffer, (social media scheduling tool) where I live by it. It doesn’t work for Jeff and there’s nothing wrong with that. We are individuals and what makes us powerful evangelists is that we figured out what works for each of us.
3. In the Know
As a technical evangelist, you can’t just read the docs and think you’re going to be received well. Theory is not practice and I’ve had a couple disagreements with managers explaining why I needed to work with the product. I’ve had to battle for hardware to build out what I’ve been expected to talk on and only once I didn’t fight for it and I paid for it drastically. I won’t write on a topic unless I can test it out on my own. Being in the trenches provides you a point of view no document can provide.
Documentation is secondary to experience.
4. Your View is Outward
This is a difficult one for most companies when they’re trying to create evangelists from internal employees. Those that may be deeply involved at the company level may interact well with others, but won’t redirect to an external view. I’ve had people ask me why my husband isn’t doing as much as I am in the community. Due to his position, he must be more internally and customer facing. My job is very separate from my fellow employees. I must always be focused outward and interact at least 95% of my time with the community. You’ll notice all of the folks listed are continually interacting with people outside of their company and are considered very “approachable.”
We volunteer our time in the community- user groups, board of directors, events and partnering with companies. We socialize, as we know our network is essential to the companies we represent.
5. We Promote
I wish I did more public promotion like I see some of these other folks. I’m like my parents- I stand up for others and support them on initiatives and goals. I do a lot of mentoring, but less when I’m blogging. My mother was never about empty compliments and I did take after her on this. I’m just not very good at remembering to compliment people on social media and feel I lack in this area, but I continually watch others do this for folks in the community and this is so important.
We ensure to work with those that may need introductions in our network, support in the community and reach out to offer our help. In the public view, this is quite transparent, so when others pay this forward or return the favor, it can appear that people just bend over backwards for us, but we often have been their for the folks in question in the past, with no expectations and people remembered this.
We do promote our company, but for the right reasons. The company has done something good for the community, has something special going on, but rarely do we push out anything marketing, as it just doesn’t come across very well from us. It’s not authentic.
I’m not saying to be a pushover. I literally have friends muted and even blocked. There’s nothing wrong with NOT being connected to individuals that have very different beliefs or social media behavior. You shouldn’t take it personally– this is professional and you should treat it as such.
You may find, (especially for women and people of color) that certain individuals will challenge you on ridiculous topics and battle you on little details. This is just the standard over-scrutinizing that we go through and if it’s not too bad, I tell people to just ignore it and not respond. If it escalates, don’t hesitate to mute or block the person. You’re not there to entertain them and by removing your contributions from their feed- “out of sight, out of mind”, offering peace to both of you… 🙂
Contribute what you want and limit to a certain percentage of what your company wants and be authentic. Find your own niche and space and don’t send out “noise.”
There are a ton of tools out there. Test out buffer, hootsuite, Klout or SumAll to make social media contributions easier. If you don’t have a blog, create one and show what you’re working on and don’t worry about the topic. You’ll be surprised that if you just write on challenges you’re facing, how you’ve solved a problem you’ve come across and write on a topic that you couldn’t find a solution to online, people will find value in your contributions.
Have fun with social media and have real conversations. People do appreciate honesty with respect. Answer comments and questions on your blog. Respond to questions on forums for your product and promote other people’s events and contributions.
When people approach you at an event or send you a direct message, try to engage with them and thank them for having the guts to come up and speak with you. It’s not easy for most people to approach someone they don’t know.
We used to be part of our community and as our world has changed with technology, the term community has changed. These communities wouldn’t exist without the contributions of people. Volunteer to help with user groups, events and forums. Don’t just volunteer to be on a board of directors and not do anything. It’s not something to just put on your CV and think you’re contributing. There is incredible power in the simple act of doing, so DO. Provide value and ask how you can help. Kent has been a board member, a volunteer and even a president of user groups. Jeff has run content selections and run events even though he’s limited in what he’s allowed to do as an Oracle employee and Rie promotes information about every woman speaker at SQL Saturday events along with all she does to run the Atlanta SQL Saturday, (largest one in the US!) I won’t even try to name all the different contributions that Grant is part of, including the new attendees event at Pass Summit, (Microsoft’s version of Oracle Open World for my Oracle peeps!)
For those companies that are thinking- “I hired an evangelist, so I want them to be all about me and all invested in the company.” If they do, you’ll never have the successful evangelist that will be embraced by the community, able to promote your product/company in a powerful, grassroots way and if their eyes are always looking inside, they will miss everything going on outside and as we all know, technology moves fast. Look away and you’ll miss it.
There are a plethora of mishaps in the early space program to prove the need for DevOps, but Fifty-five years ago this month, there was one in particular that is often used as an example for all. This simple human error almost ended the whole American space program and it serves as a strong example of why DevOps is essential as agile speeds up the development cycle.
The Mariner I space probe was a pivotal point in the space race between the United States and the Soviet Union. The space probe was a grand expedition into a series of large, sophisticated, as well as interplanetary missions, all to carry the Mariner moniker. For this venture to launch, (pun intended) it was dependent on a huge, as well as new development project for a powerful booster rocket called the Atlas-Centaur. The development program ran into so many testing failures that NASA ended up dropping the initial project and going with a less sophisticated booster to meet the release date, (i.e. features dropped from the project.) These new probe designs were based off the previously used Ranger moon probes, so there was less testing thought needed and the Atlas Agena B Booster was born, bringing the Mariner project down to a meager cost of $80 million.
The goal of the Mariner I was to perform an unmanned mission to Mars, Venus and Mercury. It was equipped with solar cells on its wings to assist on the voyage, which was all new technology, but the booster, required to escape Earth’s gravity, was an essential part of the project. As the boosters were based off of older technology than many of the newer features, the same attention wasn’t offered to it while testing was being performed.
On July 22nd, 1962, the Mariner I lifted off, but after approximately four minutes in, it veered off course. NASA made the fateful decision to terminate the spacecraft and destroyed millions of dollars of equipment, ensuring it didn’t end up crashing on its own into populated areas.
As has already been well documented, the guidance system, which was supposed to correct the Mariner 1 flight, had a single typo in the entire coded program. A missing hyphen, required for instructions to adjust flight patterns was missing. Where it should have read “R-dot-bar sub-n”, instead was “R-dot-bar sub n”. This minor change caused the program to over-correct small velocity changes and created erratic steering commands to the spacecraft.
This missing hyphen caused a loss of millions of dollars in the space program and is considered the most expensive hyphen in history.
How does this feed into the DevOps scenario?
Missing release dates for software can cost companies millions of dollars, but so can the smallest typos. Reusing code and automation of programming, along with proper policies, process and collaboration throughout the development cycle ensures that code isn’t just well written, but in these shortened development cycles, it’s reviewed and tested fully before it’s released. When releases are done in smaller test scenarios, a feedback loop is ensured so that errors are caught early and guaranteed not to go into production.
So you’re going to see a lot of posts from me in the coming months surrounding topics shared by Oracle and SQL Server. These posts offer me the opportunity to re-engage with my Oracle roots and will focus on enhancing my SQL Server knowledge for the 2014 and 2016, (2017 soon enough, too) features, which I’m behind in.
I’m going to jump right in with both feet with the topics of hints. The official, (and generic) definition of a SQL hint is:
“A hint is an addition to a SQL statement that instructs the database engine on how to execute the statement.”
Hints are most often used in discussion on queries, but they can assist in influencing the performance of inserts, updates and deletes, too. What you’ll find is that the actual terminology is pretty much the same for hints in SQL statements for Oracle and SQL Server, but the syntax is different.
Oracle hints were quite common during the infancy of the Oracle Cost Based Optimizer, (CBO). It could be frustrating for a database administrator who was accustomed to the Rules Based Optimizer, (rules, people! If there’s an index, use it!) to give up control of performance to a feature that simply wasn’t taking the shortest route to the results. As time passed from Oracle 9i to 10g, we harnessed hints less, trusting the CBO and by Oracle 11g, it started to be frowned upon unless you had a very strong use case for hinting. I was in the latter scenario, as my first Oracle 11g database environment required not just new data, but a new database weekly and a requirement for me to guarantee performance. I knew pretty much every optimal plan for every SQL statement in the systems and it was my responsibility to make sure each new database chose the most optimal plan. I had incorporated complex hints, (and then profiles as we upgraded…)
With the introduction of database version Oracle 12c, it became a sought after skill to use hints effectively again, as many new optimizer features, (often with the words “dynamic” or “automated” in them) started to impact performance beyond what was outside the allowable.
SQL Server’s optimizer took a big jump in features and functionality in SQL Server 2014. With this jump, we started to see a new era of SQL Server performance experts with the introduction of SQL Server 2016 that moved even further into expertise with optimization, not only in collecting performance data via dynamic management views/functions, (DMVs/DMFs) but also in ability to influence the SQL Server Query Optimizer to make intelligent decisions with advanced statistics features and elevated hinting.
Hints have a more convoluted history in the SQL Server world than in the Oracle one. I have to send some love and attention to Kendra Little after I found this cartoon she drew in regards to her frustration with the use of ROWLOCK hints:
After reading this, my plan is still to go deeper into a number of areas of performance, including the optimizers, but today, we’ll just stick to a high level difference on hinting in queries.
In our examples, we’ll focus on forcing the use of a HASH join instead of a nested loop, using an index for a specific table and a MERGE join. Let’s say we want to use a hash join on the Employees and a merge join on the Job_history table. We also want to make sure that we use the primary key for one of the employee ID joins, as a less optimal index usage results with lower costs even though the performance isn’t as optimal due to concurrency.
The query would look like the following in Oracle:
SELECT /*+ LEADING(e2 e1) USE_HASH(e1) INDEX(e1 emp_emp_id_pk) USE_MERGE(j) FULL(j) */ e1.Name, j.job_id, sum(e2.salary) total_sal FROM employees e1, employees e2, job_history j WHERE e1.employee_id = e2.manager_id AND e1.employee_id = j.employee_id AND e1.hire_date = j.start_date GROUP BY e1.first_name, e1.last_name, j.job_id ORDER BY total_sal;
If there was a subquery as part of this statement, we could add a second set of hints for it, as each query supports its own hints in the statement after the word SELECT.
If we were to take the same statement in SQL Server, the hints would look a bit different. Yeah, the following is about as close as I could get to “apples to apples” in hints and in TSQL, so please forgive me if it ain’t as pretty as I would have preferred it to be:
SELECT e1.Name, j.Jobid, sum(pr.Salary) Total_Salary FROM Employees AS e1, INNER MERGE JOIN Job_History AS j LEFT OUTER HASH JOIN Employees AS e2 WITH (FORCESEEK (emp_emp_id_pk(e2.EmployeeID))) ON e1.EmployeeID = e2.ManagerID WHERE e1.EmployeeID = j.EmployeeID AND e1.HireDate = j.StartDate GROUP BY e1.Name, j.JobID ORDER BY Total_Salary;
In a TSQL statement, each hint is placed at the object in the statement that its in reference to. The hints are written out commands, (vs. more hint syntax required in Oracle) and the force seek on the primary key for Employees.
As you can see, Oracle signals a hint when put between /*+ and ending with a */. Each requires some syntax and advanced performance knowledge, but all in all, the goal is the same- influence the optimizer to perform in a specific way and [hopefully] choose the optimal plan.
Please let me know if you’d like to see more in this series, either by sending me an email to dbakevlar at Gmail or commenting on this post and I’m going to go start preparing for KSCOPE 2017– Someone explain to me how it already is the end of June!! 🙂
I did a couple great sessions yesterday for the awesome Dallas Oracle User Group, (DOUG.) It was the first time I did my thought leadership piece on Making Sense of the Cloud and it was a great talk, with some incredible questions from the DOUG attendees!
This points me to a great [older] post on things IT can do to help guarantee tech projects are more successful. DevOps is a standard in most modern IT shops and DBAs are expected to find ways to be part of this valuable solution. If you inspect the graph, displaying the value of different projects in ROI, vs. how often these different types of projects run over budget and time, it may be surprising.
Where non-software projects are concerned, the project rarely runs over the schedule, but in the way of benefits, often comes up short. When we’re dealing with software, 33% of project run over time, but the ROI is excruciatingly high and worthy of the investment. You have to wonder how much of that over-allocation in time feeds into the percentage increase in cost? If this could be deterred, think about how more valuable these projects would become?
The natural life of a database is growth. Very few databases stay a consistent size, as companies prosper, critical data valuable to the company requires a secure storage location and a logical structure to report on that data is necessary for the company’s future. This is where relational databases come in and they can become the blessing and the burden of any venture. Database administrators are both respected and despised for their necessity to manage the database environment as the health of the database is an important part of the IT infrastructure and with the move to the cloud, a crucial part of any viable cloud migration project.
How much of that time, money and delay shown in those projects are due to the sheer size and complexity of the database tier? Our source data shows how often companies just aren’t able to hold it together due to lacking skills, lacking estimates in time estimates and other unknowns that come back to bit us.
I can’t stress enough why virtualization is key to removing a ton of the overhead, time and money that ends up going into software projects that include a database.
Virtualizing non-production databases results in:
It’s definitely something to think about and if you don’t believe me, test it yourself with a free trial! Not enough people are embracing virtualization and it takes so much of the headache out of RDBMS management.
For over a year I’ve been researching cloud migration best practices. Consistently there was one red flag that trips me that I’m viewing recommended migration paths. No matter what you read, just about all of them include the following high level steps:
As we can see from above, the scope of the project is identified, requirements laid out and a project team is allocated.
The next step in the project is to choose one or more clouds, choose the first environments to test out in the cloud, along with security concerns and application limitations. DBAs are tested repeatedly as they continue to try to keep up with the demand of refreshing or ensuring the cloud environments are able to keep in sync with on-prem and the cycle continues until a cutover date is issued. The migration go or no-go occurs and the either non-production or all of the environment is migrated to the cloud.
As someone who works for Delphix, I focus on the point of failure where DBAs can’t keep up with full clones and data refreshes in cloud migrations or development and testing aren’t able to complete the necessary steps that could be if the company was using virtualization. From a security standpoint, I am concerned with how few companies aren’t investing in masking with the sheer quantity of breeches in the news, but as a DBA, there is a whole different scenario that really makes me question the steps that many companies are using to migrate to the cloud.
Now here’s where they loose me every time- the last step in most cloud migration plans is to optimize.
I’m troubled by optimization being viewed as the step you take AFTER you migrate to the cloud. Yes, I believe that there will undoubtedly be unknowns that no one can take into consideration before the physical migration to a cloud environment, but to take databases, “as is” when an abundance of performance data is already known about the database that could and will impact performance, seems to be inviting unwarranted risk and business impact.
So here’s my question to those investing in a cloud migration or have already migrated to the cloud- Did you streamline and optimize your database/applications BEFORE migrating to the cloud or AFTER?
I was in a COE, (Center of Excellence) meeting yesterday and someone asked me, “Kellyn, is your blog correct? Are you really speaking at a Blockchain event??” Yeah, I’m all over the technical map these days and you know what?
I love the variety of technology, the diversity of attendance and the differences in how the conferences are managed. Now that last one might seem odd and you might think that they’d all be similar, but its surprising how different they really are.
Today I’m going to talk about an aspect of conferences that’s very near to my heart, which is networking via events. For women in technology, there are some unique challenges for us when it comes to networking. Men have concerns about approaching women to network- such as fearful of accusations of inappropriate interaction and women have the challenge that a lot of networking opportunities occur outside of the workplace and in social situations that we may not be comfortable in. No matter who you are, no matter what your intentions, there’s a lot of wariness and in the end, women often just lose out when it comes to building their network. I’ve been able to breach this pretty successfully, but I have seen where it’s backfired and have found myself on more than one occasion defending both genders who’ve ended up on the losing side of the situation.
With that said, conferences and other professional events can assist with helping us geeks build our networks and it’s not all about networking events. I noticed a while back that the SQL Server community appeared to be more networked among their members. I believe part of this is due to the long history of their event software and some of its features.
Using the SQL Pass website, specifically the local user group event management software- notice that its all centralized. Unlike the significantly independent Oracle user groups, SQL Server user groups are able to use a centralized repository for their event management, speaker portal, scheduling, etc. It’s not to say that there aren’t any events outside of Pass Summit and SQL Saturdays, there’s actually a ton, but this was the portal for the regional user groups, creating the spoke that bridged out to the larger community.
Outside of submitting my abstract proposals to as many SQL Saturdays worldwide from one portal, I also can maintain one speaker biography, information about my blog, Twitter, Linkedin and other social media in this one location.
The second benefit of this simplicity, is that these biographies and profiles “feed” the conference schedules and event sites. You have a central location for management, but hundreds of event sites where different members can connect. After abstracts have been approved and the schedule built, I can easily go into an event’s schedule and click on each speaker biography and choose to connect with anyone listed who has entered their social media information in their global profile.
Using my profile as an example, you’ll notice the social media icons under my title are available with a simple click of the mouse:
This gives me both an easy way to network with my fellow speakers, but also an excuse to network with them! I can click on each one of the social media buttons and choose to follow each of the speakers on Twitter and connect with them on Linkedin. I send a note with the Linkedin connection telling the speaker that we’re both speaking at the event and due to this, I’d like to add them to my network.
As you can join as many regional and virtual user groups as you like, (and your Pass membership is free) I joined the three in Colorado, (Denver, Boulder and Colorado Springs.) Each one of those offers the ability to also connect with the board members using a similar method, (now going to use Todd and David as my examples from the Denver SQL Server user group.)
The Oracle user groups have embraced adding twitter links to most speaker bios and some board groups, but I know for RMOUG, many still hesitated or aren’t using social media to the extent they could. I can’t stress enough how impressed I am when I see events incorporate Linkedin and Twitter into their speaker and management profiles, knowing the value they bring to technical careers, networks and the community.
Although the SQL Server community is a good example, they aren’t the only ones. I’m also speaking at new events on emergent technologies, like Data Platforms 2017. I’ll be polite and expose my own profile page, but I’m told I’m easy to find in the sea of male speakers… 🙂 Along with my picture, bio and session information, there are links to my social media connections, allowing people to connect with me:
Yes, the Bizzabo software, (same software package that RMOUG will be using for our 2018 conference, along with a few other Oracle events this coming year) is aesthetically appealing, but more importantly, it incorporates important networking features that in the past just weren’t as essential as they are in today’s business world.
I first learned the networking tactic of connecting with people I was speaking with from Jeff Smith and I think its a great skill that everyone should take advantage of, no matter if you’re speaking or just attending. For women, I think it’s essential to your career to take advantage of opportunities to network outside of the traditional ways we’ve been taught in the past and this is just one more way to work around that glass ceiling.
I recently switched to a Mac after decades use with PCs. I loved my Surface Pro 4 and still do, but that I was providing content for those I thought would be on Macs, it seemed like a good idea at the time. I didn’t realize at the time I’d be doing as many SQL Server conferences as Oracle in my next role with the Delphix… 🙂
With this change, I found myself limited to VMs running on my Mac with SQL Server, then I was working with Azure and it seemed like a lot of extra “weight” to just have access to a few tools. I figured I wasn’t the only one and did some research, locating SQLPro for MSSQL from Hankinsoft. Its an easy to configure and use, 12Mb query interface, (also available from the App Store) that was created for SQL Server users that find themselves on Mac.
If you’re using Azure, you simply need to update our firewall rules to allow access for your local IP address and connection to the Azure SQL database is simple after this update, (and will need to be updated each time, if you’re like me and change locations, (and IP addresses each time.))
You can collect the information for your Azure database from the Azure administration Console, under Database and Overview:
This offers a very robust, full featured and comparable tool to Oracle’s SQL Developer for those that want to work with SQL Server databases but are on a Mac. I didn’t go into the features, but those, I leave to you right now to discover… 🙂
I’m itching to dig more into the SQL Server 2016 optimizer enhancements, but I’m going to complete my comparison of indices between the two platforms before I get myself into further trouble with my favorite area of database technology.
Index Organized Tables, (IOT) are just another variation of a primary b-tree index, but unlike a standard table with an index simply enforcing uniqueness, the index IS the table. The data is arranged in order to improve performance and in a clustered primary key state.
This is the closest to a clustered index in SQL Server that Oracle will ever get, so it makes sense that a comparison in performance and fragmentation is the next step after I’ve performed standard index and primary key index comparisons to Oracle.
Let’s create a new copy of our Oracle objects, but this time, update to an Index Organized Table:
CREATE TABLE ora_tst_iot( c1 NUMBER, c2 varchar2(255), CREATEDATE timestamp DEFAULT CURRENT_TIMESTAMP, CONSTRAINT pk_ora_iot PRIMARY KEY (c1)) ORGANIZATION INDEX TABLESPACE users PCTTHRESHOLD 20 OVERFLOW TABLESPACE users;
CREATE SEQUENCE C1_iot_SEQ START WITH 1; CREATE OR REPLACE TRIGGER C1_iot_BIR BEFORE INSERT ON ORA_TST_IOT FOR EACH ROW BEGIN SELECT C1_iot_SEQ.NEXTVAL INTO :new.C1 FROM DUAL; END; /
The PCTThreshold can be anywhere between 0-50, but I chose 20 for this example. I didn’t add any compression, as C1 is a simple sequence which won’t have the ability to take advantage of compression and I also added the additional support objects of a sequence and a trigger, just as I did for the previous test on the Oracle table.
Now we’ll insert the rows from ORA_INDEX_TST into ORA_TST_IOT
SQL> insert into ora_tst_iot(c2) select c2 from ora_index_tst; 995830 rows created. Elapsed: 00:00:04:01
There won’t be any fragmentation in the current table- it was directly loaded- no deletes, no updates. Although it won’t be shown in the examples, I will collect stats at regular intervals and flush the cache to ensure I’m not impacted in any of my tests.
SQL> ANALYZE INDEX PK_INDEXPS VALIDATE STRUCTURE; Index analyzed. Elapsed: 00:00:00.37 SQL> select index_name from dba_indexes 2 where table_name='ORA_TST_IOT'; INDEX_NAME ------------------------------ PK_ORA_IOT
SQL> analyze index pk_ora_iot validate structure; Index analyzed. SQL> analyze index pk_ora_iot compute statistics; Index analyzed. SQL> SELECT LF_BLKS, LF_BLK_LEN, DEL_LF_ROWS,USED_SPACE, PCT_USED FROM INDEX_STATS where NAME='PK_ORA_IOT'; 32115 7996 0 224394262 88
Well, the table IS THE INDEX. We collect stats on the table and now let’s remove some data, rebuild and then see what we can do to this IOT-
SQL> select * from ora_tst_iot 2 where c1=994830; 994830 SBTF02LYEQDFGG2522Q3N3EA2N8IV7SML1MU1IMEG2KLZA6SICGLAVGVY2XWADLZSZAHZOJI5BONDL2L 0O4638IK3JQBW7D92V2ZYQBON49NHJHZR12DM3JWJ1SVWXS76RMBBE9OTDUKRZJVLTPIBX5LWVUUO3VU VWZTXROKFWYD33R4UID7VXT2NG5ZH5IP9TDOQ8G0 10-APR-17 03.09.52.115290 PM SQL> delete from ora_tst_iot 2 where c1 >=994820 3 and c1 <=994830; 11 rows deleted. SQL> commit;
Now we see what the data originally looked like- C2 is a large column data that was consuming significant space.
What if we now disable our trigger for our sequence and reinsert the rows with smaller values for c2, rebuild and then update with larger values again?
ALTER TRIGGER C1_IOT_BIR DISABLE; INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994820, 'A'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994821, 'B'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994822, 'C'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994823, 'D'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994824, 'E'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994825, 'F'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994826, 'G'); INSERT INTO ORA_TST_IOT(C1,C2) VALUES (994827, 'H');
so on and so forth till we reach 994830…
COMMIT and then let’s rebuild our table…
ALTER TABLE ORA_IOT_TST REBUILD;
What happens to the table, (IOT) when we’ve issued this command? It’s moving all the rows back to fill up each block up to the pct free. For an IOT, we can’t simply rebuild the index, as the index IS THE TABLE.
Now we’ve re-organized our IOT so the blocks are only taking up the space that it would have when it was first inserted into. So let’s see what happens now that we issue an UPDATE to those rows-
SQL> update ora_tst_iot set c2=dbms_random.string('B',200) 2 where c1 >=994820 3 and c1 <=994830; 11 rows updated.
So how vulnerable are IOTs to different storage issues?
Chained Rows after updating, moving data and then updating to larger data values than the first with 10% free on each block?
Just 11 rows shows the pressure:
SQL> SELECT 'Chained or Migrated Rows = '||value FROM v$sysstat WHERE name = 'table fetch continued row'; Chained or Migrated Rows = 73730
Let’s delete and update more rows using DML like the following:
SQL> delete from ora_tst_iot 2 where c2 like '%300%'; 4193 rows deleted.
Insert rows for 300 with varying degree of lengths, delete more, rinse and repeat and update and delete…
So what has this done to our table as we insert, update, delete and then insert again?
SQL> SELECT table_name, iot_type, iot_name FROM USER_TABLES WHERE iot_type IS NOT NULL; 2 TABLE_NAME IOT_TYPE IOT_NAME ------------------------------ ------------ ------------------------------ SYS_IOT_OVER_88595 IOT_OVERFLOW ORA_TST_IOT ORA_TST_IOT IOT
This is where a clustered index and an IOT is very different. There is a secondary management object involved when there is overflow. If you look up at my creation, yes, I chose to create an overflow. Even if I drop the IOT properly, the overflow table will go into the recycle bin, (unless I’ve configured the database without it.)
SQL> select index_name from dba_indexes 2 where table_name='ORA_TST_IOT'; INDEX_NAME ------------------------------ PK_ORA_IOT SQL> analyze index pk_ora_iot validate structure; Index analyzed. SQL> select blocks, height, br_blks, lf_blks from index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 32768 3 45 32115
We can see, for the blocks, the rows per leaf blocks aren’t too many- this is a new table without a lot of DML, but we still see that with the current configuration, there aren’t a lot of rows returned per leaf block.
When we select from the IOT, the index is in full use and we can see that with the proper pct free/pct used, the index is still in pretty good shape:
SQL> select * from table(dbms_xplan.display_awr('fbhfmn88tq99z')); select c1, c2, createdate from ora_tst_iot Plan hash value: 3208808379 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 8709 (100)| | 1 | INDEX FAST FULL SCAN| PK_ORA_IOT | 995K| 206M| 8709 (1)| 00:01:4 5 | 13 rows selected.
SQL> analyze index pk_ora_iot validate structure; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 32768 3 45 32058 SQL> select pct_used from index_stats; PCT_USED ---------- 88
So now what happens, if like our original test, we shrink down the percentage of what can be used and reorganize, (and please don’t do this in production…or test….or dev….or ever! 🙂)?
SQL> alter table ora_tst_iot move pctfree 90; Table altered. SQL> analyze index pk_ora_iot validate structure; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 172928 3 228 165630
Well, that’s a few more leaf blocks, eh? Insert after enabling trigger again-
SQL> BEGIN FOR i in 1..1000000 LOOP INSERT INTO ORA_TST_IOT(c2) VALUES(i); END LOOP; COMMIT; END; /
Now we have our elapsed time for Azure inserts of 1 million records with 100% and 10%. Let’s compare it to our IOT. The IOT move command to fill the blocks to 100% was quite fast. Of course, the reverse, only allowing for 10%, (90% free) took F.O.R.E.V.E.R…, (OK, it sure felt like it…why didn’t I just truncate it? Oh, yeah, I wanted it to be a real test, not simply an easy test..)
Note: For this test, we’ll rebuild after updating the pctfree each time.
10% Fill Factor in SQL Server and 1 million insert: Elapsed time: 23 minutes, 18 seconds
90% PCTFree in Oracle and 1 million insert: 7 min, 12 seconds
100% Fill Factor in SQL Server and 1 million insert: Elapsed Time: 4 minutes, 43 seconds
0% PCTFree in Oracle and 1 million insert: 1 min, 8 seconds
REBUILD of the Oracle IOT to make it 90% free in each block? Elapsed Time: 8 hrs, 21 minutes, 12 seconds
…along with four backups of archive logs it generated that filled up the archive dest… 🙂 Now the AWS Trial is to be used to test out the Delphix product, not to test out index performance in a high insert/delete/update scenario, so I’ve been asking for some of these challenges, but it was still a great way to build this out quickly and then compare.
In this test, this was the overall results:
Now there’s more to do comparisons on, so I’m going to dig in more on the SQL Server side, but here’s to Oracle Index Organized Tables, (IOTs)!
This post has a lot of the support code and data for my Oak Table Talk that I’ll be giving at IOUG Collaborate 2017 in Las Vegas on April 5th, 2017.
One of the Iceland 2017 SQL Saturday sessions got me thinking about indexing and how similar and different it all is in Oracle vs. SQL Server. There was some really fun, (well, at least what I call fun…) test cases built out and referenced by Paul Randal. After looking through some of it, I decided it might be interesting to try to replicate it to Oracle, (as close as possible) and compare how the two database platforms deal with index storage and specifically- SQL Server’s Fill Factor vs. Oracle PctIncrease index percentage filled.
B-tree indexing is the cornerstone of physically optimizing searches on data. No consensus exists on what the “B” stands for, (some think its from Bayer, for one of the main gentlemen who did the research and many more believe it’s for Boeing, for the Research Center the research was done at.)
The choice in how the data is organized, leafs and such are pretty standard, but database platforms have created some unique indexing that enhances queries on RDBMS vs. just having heap tables.
Using Oracle and SQL Server as our choice for a comparison today, there are a few translations I need for readers of this blog:
|Index Organized Table, (IOT)||Clustered Index||physical index storing data in their key values. In SQL Server, there can be only one Clustered index per table.|
|Pctfree of block||FillFactor of page||Percent of storage that is allowed filled. There are different times when this is used for each platform.|
|Sequence||TOP||Ability to populate data with a sequential number|
|dbms_random.string||Replicate||Ability to populate data with string values|
|block||page||unit of storage|
|Automatic Workload Repository, (AWR)||Dynamic Management Views, (DMV)||Performance data collection|
Now that we have that out of the way, you can use this trusty, little graph for common terms that require a “translation” from one database platform to the other.
The next thing to remember is that PCTFree and FillFactor aren’t adhered to at all times. Appending a row to an index is different than updating a row in an index and each platform has it’s own set of criteria to decide if it follows the rule of percentage of a block or page to fill or not.
The steps of this test:
Yes, I could have just used RowNum, but I was trying to kill a second bird, (testing task) with this one stone, so an trigger with a sequence it is… 🙂
CREATE TABLE ORA_INDEX_TST ( C1 NUMBER NOT NULL ,C2 VARCHAR2(255) ,CREATEDATE TIMESTAMP ); CREATE INDEX PK_INDEXPS ON ORA_INDEX_TST (C1); ALTER TABLE ORA_INDEX_TST ADD CONSTRAINT OIT_PK PRIMARY KEY(C1) USING INDEX PK_INDEXPS; CREATE UNIQUE INDEX IDX_INDEXPS ON ORA_INDEX_TST(C2); ALTER INDEX PK_INDEXPS REBUILD PCTFREE 90 INITRANS 5; ALTER INDEX IDX_INDEXPS REBUILD PCTFREE 90 INITRANS 5; CREATE SEQUENCE C1_SEQ START WITH 1; CREATE OR REPLACE TRIGGER C1_BIR BEFORE INSERT ON ORA_INDEX_TST FOR EACH ROW BEGIN SELECT C1_SEQ.NEXTVAL INTO :new.C1 FROM DUAL; END; /
We’ll need to manually insert just enough data to fill up one block, which is 8KB in this database, (and default.)
INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('A', 200), SYSDATE); INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('B', 200), SYSDATE); INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('C', 200), SYSDATE); INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('D', 200), SYSDATE); INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('E', 200), SYSDATE); INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('F', 200), SYSDATE); INSERT INTO ORA_INDEX_TST (C2, CREATEDATE) VALUES (dbms_random.string('G', 200), SYSDATE); COMMIT;
We’ll now verify that our data is inserted into one block:
SQL> ANALYZE INDEX PK_INDEXPS VALIDATE STRUCTURE; SQL> SELECT LF_BLKS, LF_BLK_LEN, DEL_LF_ROWS,USED_SPACE, PCT_USED FROM INDEX_STATS where NAME='PK_INDEXPS';
LF_BLKS LF_BLK_LEN DEL_LF_ROWS USED_SPACE PCT_USED ---------- ---------- ----------- ---------- ---------- 1 7924 0 1491 19
Code for SQL Server Objects and Support. Since I didn’t have the same secondary project request, this one will appear simpler:
CREATE TABLE SQL_INDEX_TST (c1 INT NOT NULL, c2 CHAR (255), createdate datetime); CREATE INDEX CL2_INDEX_TST ON SQL_INDEX_TST(C2); GO ALTER TABLE SQL_INDEX_TST ADD CONSTRAINT PK_CLINDX_TST PRIMARY KEY CLUSTERED (c1); GO
First, in SQL Server, a page will hold around 8KB of data, so let’s test out our index storage:
INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (1, 'a'); INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (2, 'a'); INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (3, 'a'); INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (4, 'a'); INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (5, 'a'); INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (6, 'a'); INSERT INTO SQL_INDEX_TST(c1,c2) VALUES (7, 'a'); GO
We now have officially “filled” the first page as much as possible and we should see this if we query the information schema:
SELECT OBJECT_SCHEMA_NAME(ios.object_id) + '.' + OBJECT_NAME(ios.object_id) as table_name, i.name as index_name, leaf_allocation_count, nonleaf_allocation_count, fill_factor, type_desc FROM sys.dm_db_index_operational_stats(DB_ID(), OBJECT_ID('dbo.SQL_INDEX_TST'),NULL, NULL) ios INNER JOIN sys.indexes i ON i.object_id = ios.object_id AND i.index_id = ios.index_id;
SQL> Begin For IDS in 1..1000000 Loop INSERT INTO ORA_INDEX_TST (C2) VALUES (dbms_random.string('X', 200)); Commit; End loop; End; /
10% PCT Free- Time Elapsed 2 minutes, 12 seconds
90% PCT Free- Time Elapsed 7 minutes, 3 seconds
I’ll have both the initial test data and the new 10000 rows I’ve added:
SQL> select count(*) from ora_index_tst; COUNT(*) ---------- 1000008
Let’s delete some of this data load to create deleted leaf blocks:
SQL> delete from ora_index_tst 2 where c2 like '%200%'; 4179 rows deleted. SQL> commit; Commit complete.
Now let’s analyze and take a look at the stats again:
SELECT LF_BLKS, LF_BLK_LEN, DEL_LF_ROWS,USED_SPACE, PCT_USED FROM INDEX_STATS where NAME='PK_INDEXPS';
LF_BLKS LF_BLK_LEN DEL_LF_ROWS USED_SPACE PCT_USED ---------- ---------- ----------- ---------- ---------- 41227 7924 121 212596009 19
There’s a substantial difference in number of leaf blocks vs. when the pct_used is allowed to fill up:
LF_BLKS LF_BLK_LEN DEL_LF_ROWS USED_SPACE PCT_USED ---------- ---------- ----------- ---------- ---------- 2004 7996 531 15985741 100
Oracle wasn’t impacted by PCTFREE that much, but there was some impact. Rebuilds were required to clean up some wait, but it wasn’t a true “requirement”, just a preferences if consistent deletes, updates where data was different sized than original and poor storage choices. The differences in performance weren’t that significant.
Now that we know we have deleted rows, let’s do the same on the SQL Server side:
declare @id int select @id = 9 --already inserted 8 rows while @id >= 0 and @id <= 1000000 begin insert into sql_index_tst (c1,c2) values(@id, 'DKUELKJ' + convert(varchar(7), @id)) select @id = @id + 1 end
Default Fill Factor- Elapsed Time: 4 minutes, 43 seconds
10% Fill Factor- Elapsed time: 23 minutes, 18 seconds
Delete some rows to test similar to Oracle:
DELETE FROM SQL_INDEX_TST WHERE c2 LIKE ‘%200%’;
Now there are a few ways we can look at how the indexes were impacted. We’ll first check for page splits, which as we’ve discussed, cause extra work to the transaction log and fragmentation in the index:
SELECT OBJECT_SCHEMA_NAME(ios.object_id) + '.' + OBJECT_NAME(ios.object_id) as table_name ,i.name as index_name ,leaf_allocation_count ,nonleaf_allocation_count FROM sys.dm_db_index_operational_stats(DB_ID(), OBJECT_ID('dbo.SQL_Index_tst'),NULL, NULL) ios INNER JOIN sys.indexes i ON i.object_id = ios.object_id AND i.index_id = ios.index_id;
Next, we’ll look at the physical fragmentation of the index:
SELECT OBJECT_SCHEMA_NAME(ips.object_id) + '.' + OBJECT_NAME(ips.object_id) as table_name ,ips.avg_fragmentation_in_percent ,ips.fragment_count ,page_count FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.SQL_Index_tst') GO
There’s significant fragmentation and it also impacted performance as we viewed above.
USE AS_test; GO DBCC DBREINDEX ('SQL_INDEX_TST', CL2_INDEX_TST,100); DBCC DBREINDEX ('SQL_INDEX_TST', PK_CLINDX_TST,100); GO
We’ve now rebuilt our indexes and moved the fillfactor to 100%. Queries using each index column in where clause improved over 20%. Insert and updates increased to perform similarly to Oracle, unless….
Sorts on data for C1 column on a clustered index in SQL Server increased dramatically and out-performed Oracle’s PK. Only IOT tables could compete, but the use case was very small where it was beneficial.
So who won out in my comparison at Oak Table World? As we always hear from the DBA,
Some of the benefits of clustered indexes in SQL Server are superior to Oracle:
There are negatives that leave this debate still open for me:
There was a lot of other tests and queries I used than what is presented here, but this is the main focus of the test. I need to thank those that have contributed to the deep index knowledge that offered me the research to then want to research on my own. Shout out to Richard Foote, Mr. Oracle Index and Paul Randal and Jason Strate for the SQL Server expertise!
I ended up speaking at two events this last week. Now if timezones and flights weren’t enough to confuse someone, I was speaking at both an Oracle AND a SQL Server event- yeah, that’s how I roll these days.
I arrived last Sunday in Salt Lake, which is just a slightly milder weather and more conservative version of Colorado, to speak at UTOUG’s Spring Training Days Conference. I love this location and the weather was remarkable, but even with the warm temps, skiing was still only a 1/2 hour drive from the city. Many of the speakers and attendees took advantage of this opportunity by doing just that while visiting. I chose to hang out with Michelle Kolbe and Lori Lorusso. I had a great time at the event and although I was only onsite for 48hrs, I really like this event so close to my home state.
I presented on Virtualization 101 for DBAs and it was a well attended session. I really loved how many questions I received and how curious the database community has become about how this is the key to moving to the cloud seamlessly.
There are significant take-aways from UTOUG. The user group, although small, is well cared for and the event is using some of the best tools to ensure that they get the best bang for the buck. It’s well organized and I applaud all that Michelle does to keep everyone engaged. It’s not an easy endeavor, yet she takes this challenge on with gusto and with much success.
After spending Wednesday at home, I was back at the airport to head to Reykjavik, Iceland for their SQL Saturday. I’ve visited Iceland a couple times now and if you aren’t aware of this, IcelandAir offers up to 7 day layovers to visit Iceland and then you can continue on to your final destination. Tim and I have taken advantage of this perk on one of our trips to OUGN, (Norway) and it was a great way to visit some of this incredible country. When the notification arrived for SQL Saturday Iceland, I promptly submitted my abstracts and crossed my fingers. Lucky for me, Ásgeir Gunnarsson accepted my abstract and I was offered the chance to speak with this great SQL Server user group.
After arriving before 7am on Friday morning at Keflavik airport, I realized that I wouldn’t have a hotel room ready for me, no matter how much I wanted to sleep. Luckily there is a great article on the “I Love Reykjavik” site offering inside info on what to do if you do show up early. I was able to use the FlyBus to get a shuttle directly to and from my hotel, (all you have to do is ask the front desk to call them the night before you’re leaving and they’ll pick you back up in front of your hotel 3 hrs before your flight.) Once I arrived, I was able to check in my bags with their front desk and headed out into town.
I stayed at Hlemmur Square, which was central to the town and the event and next to almost all of the buses throughout the city. The main street in front of it, Laugavegur, is one of the main streets that runs East-West and is very walkable. Right across this street from the hotel was a very “memorable” museum, the Phallilogical Museum. I’m not going to link to it or post any pictures, but if you’re curious, I’ll warn you, it’s NSFW, even if it’s very, uhm…educational. It was recommended by a few folks on Twitter and it did ensure I stayed awake after only 2 hours of sleep in 24 hours!
As I wandered about town, there are a few things you’ll note about Iceland- the murals of graffiti is really awesome and Icelandic folks like good quality products- the stores housed local and international goods often made from wool, wood, quality metal and such. The city parliment building is easily accessible and it’s right across from the main shopping area and new city development.
On Saturday, I was quick to arrive at Iceland’s SQL Saturday, as I had a full list of sessions I wanted to attend. I was starting to feel the effects of Iceland weather on my joints, but I was going to make sure I got the most out of the event. I had connected with a couple of the speakers at the dinner the night before, but with jet lag, you hope you’ll make a better impression on the day of the event.
I had the opportunity to learn about the most common challenges with SQL Server 2016 and that Dynamic Data Masking isn’t an enterprise solution. Due to lacking discovery tools, the ability to join to non-masked objects and common values, (i.e. 80% of data is local and the most common location value would easily be identified, etc.) the confidential data of masked objects could be identified.
I also enjoyed an introduction to containers with SQL Server and security challenges. The opening slide from Andy says it all:
Makes you proud to be an American, doesn’t it? 🙂
My session was in the afternoon and we not only had excellent discussions on how to empower database environments with virtualization, but I even did a few quick demonstrations of ease of cloud management with AWS and Oracle…yes, to SQL Server DBAs. It was interesting to see the ease of management, but how easy it was for me to manage Oracle with the interface. I performed all validations of data refreshes from the command line, so there was no doubt that I was working in Oracle, yet the refreshes and such were done in AWS and with the Delphix Admin console.
I made it through the last session on the introduction to containers with SQL Server, which included a really interesting demonstration of a SQL Server container sans an OS installation, allowing it to run with very limited resource requirements on a Mac. After this session was over, I was thankful that two of my fellow presenters were willing to drop me off at my hotel and I promptly collapsed in slumber, ready to return home. I was sorry to miss out on the after event dinner and drinks, but learned that although I love Iceland, a few days and some extra recovery time may be required.
I thought I’d do something on Oracle this week, but then Microsoft made an announcement that was like an early Christmas present- SQL Server release for Linux.
I work for a company that supports Oracle and SQL Server, so I wanted to know how *real* this release was. I first wanted to test it out on a new build and as they recommend, along as link to an Ubuntu install, I created a new VM and started from there-
There were a couple packages that were missing until the repository is updated to pull universe by adding repository locations into the sources.list file:
There is also a carriage return at the end of the MSSQL installation when it’s added to the sources.list file. Remove this before you save.
Once you do this, if you’re chosen to share your network connection with your Mac, you should be able to install successfully when running the commands found on the install page from Microsoft.
The second install I did was on a VM using CentOS 6.7 that was pre-discovered as a source for one of my Delphix engines. The installation failed upon running it, which you can see here:
Even attempting to work around this wasn’t successful and the challenge was that the older openssl wasn’t going to work with the new SQL Server installation. I decided to simply upgrade to CentOS 7.
The actual process of upgrading is pretty easy, but there are some instructions out there that are incorrect, so here are the proper steps:
[upgrade] name=upgrade baseurl=http://dev.centos.org/centos/6/upg/x86_64/ enabled=1 gpgcheck=0
Save this file and then run the following:
yum install preupgrade-assistant-contents redhat-upgrade-tool preupgrade-assistant
You may see that one has stated it won’t install as newer ones are available- that’s fine. As long as you have at least newer packages, you’re fine. Now run the preupgrade
The log final output may not write, also. If you are able to verify the runs outside of this and it says that it was completed successfully, please know that the pre-upgrade was successful as a whole.
Once this is done, import the GPG Key:
rpm --import http://mirror.centos.org/centos/RPM-GPG-KEY-CentOS-7
After the key is imported, then you can start the upgrade:
/usr/bin/redhat-upgrade-tool-cli --force --network 7 --instrepo=http://mirror.centos.org/centos/7/os/x86_64
Once done, then you’ll need to reboot before you run your installation of SQL Server:
Once the VM has cycled, then you can run the installation using the Redhat installation as root, (my delphix user doesn’t have the rights and I decided to have MSSQL installed under root for this first test run):
curl https://packages.microsoft.com/config/rhel/7/mssql-server.repo > /etc/yum.repos.d/mssql-server.repo
Now run the install:
sudo yum install -y mssql-server
Once its completed, it’s time to set up your MSSQL admin and password:
One more reboot and you’re done!
You should then see your SQL Server service running with the following command:
systemctl status mssql-server
You’re ready to log in and create your database, which I’ll do in a second post on this fun topic.
OK, you linux fans, go MSSQL! 🙂
OK, so I’m all over the map, (technology wise) right now. One day I’m working with data masking on Oracle, the next it’s SQL Server or MySQL, and the next its DB2. After almost six months of this, the chaos of feeling like a fast food drive thru with 20 lanes open at all times is starting to make sense and my brain is starting to find efficient ways to siphon all this information into the correct “lanes”. No longer is the lane that asked for a hamburger getting fries with hot sauce… 🙂
One of the areas that I’ve been spending some time on is the optimizer and differences in Microsoft SQL Server 2016. I’m quite adept on the Oracle side of the house, but for MSSQL, the cost based optimizer was *formally* introduced in SQL Server 2000 and filtered statistics weren’t even introduced until 2008. While I was digging into the deep challenges of the optimizer during this time on the Oracle side, with MSSQL, I spent considerable time looking at execution plans via dynamic management views, (DMVs) to optimize for efficiency. It simply wasn’t at the same depth as Oracle until the subsequent releases and has grown tremendously in the SQL Server community.
As SQL Server 2016 takes hold, the community is starting to embrace an option that Oracle folks have done historically- When a new release comes out, if you’re on the receiving end of significant performance degradation, you have the choice to set the compatibility mode to the previous version.
I know there are a ton of Oracle folks out there that just read that and cringed.
Compatibility in MSSQL is now very similar to Oracle. We allocate the optimizer features by release version value, so for each platform it corresponds to the following:
|Oracle||12c release 2||18.104.22.168.0|
SQL Server has had this for some time, as you can see by the following table:
|Product||Database Engine Version||Compatibility Level Designation||Supported Compatibility Level Values|
|SQL Server 2016||13||130||130, 120, 110, 100|
|SQL Database||12||120||130, 120, 110, 100|
|SQL Server 2014||12||120||120, 110, 100|
|SQL Server 2012||11||110||110, 100, 90|
|SQL Server 2008 R2||10.5||105||100, 90, 80|
|SQL Server 2008||10||100||100, 90, 80|
|SQL Server 2005||9||90||90, 80|
|SQL Server 2000||8||80||80|
These values can be viewed in each database using queries for the corresponding command line tool.
SELECT name, value, description from v$parameter where name='compatible';
Now if you’re in database 12c and multi-tenant, then you need to ensure you’re correct database first:
ALTER SESSION SET CONTAINER = <pdb_name>; ALTER SYSTEM SET COMPATIBLE = '22.214.171.124.0';
SELECT databases.name, databases.compatibility_level from sys.databases GO ALTER DATABASE <dbname> SET COMPATIBILITY_LEVEL = 120 GO
How many of us have heard, “You can call it a bug or you can call it a feature”? Microsoft has taken a page from Oracle’s book and refer to the need to set the database to the previous compatibility level as Compatibility Level Guarantee. It’s a very positive sounding “feature” and for those that have upgraded and are suddenly faced with a business meltdown due to a surprise impact once they do upgrade or simply from a lack of testing are going to find this to be a feature.
So what knowledge, due to many years of experience with this kind of feature, can the Oracle side of the house offer to the MSSQL community on this?
I think anyone deep into database optimization knows that “duct taping” around a performance problem like this- by moving the compatibility back to the previous version is wrought with long term issues. This is not addressing a unique query or even a few transactional processes being addressed with this fix. Although this should be a short term fix before you launch to production, [we hope] experience has taught us on the Oracle side, that you have databases that exist for years in a different compatibility version than the release version. Many DBAs have databases that they are creating work arounds and applying one off patch fixes for because the compatibility either can’t or won’t be raised to the release version. This is a database level way of holding the optimizer at the previous version. The WHOLE database.
You’re literally saying, “OK kid, [database], we know you’re growing, so we upgraded you to latest set of pants, but now we’re going to hem and cinch them back to the previous size.” Afterwards we say, “Why aren’t they performing well? After all, we did buy them new pants!”
So by “cinching” the database compatibility mode back down, what are we missing in SQL Server 2016?
Now there is a change I don’t like, but I do prefer how Microsoft has addressed it in the architecture. There is a trace flag 2371 that controls, via on or off, if statistics are updated at about 20% change in row count values. This is now on by default with MSSQL 2016 compatibility 130. If it’s set to off, then statistics at the object level aren’t automatically updated. There are a number of ways to do this in Oracle, but getting more difficult with dynamic sampling enhancements that put the power of statistics internal to Oracle and less in the hands of the Database Administrator. This requires about 6 parameter changes in Oracle and as a DBA who’s attempted to lock down stats collection, its a lot easier than said. There were still ways that Oracle was able to override my instructions at times.
There is also a flag to apply hot fixes which I think is a solid feature in MSSQL that Oracle could benefit from, (instead of us DBAs scrambling to find out what feature was implemented, locating the parameter and updating the value for it…) Using trace flag 4199 granted the power to the DBA to enable any new optimizer features, but, just like Oracle, with the introduction of SQL Server 2016, this is now controlled with the compatibility mode. I’m sorry MSSQL DBAs, it looks like this is one of those features from Oracle that, (in my opinion) I wish would have infected cross platform in reverse.
As stated, the Compatibility Level Guarantee sounds pretty sweet, but the bigger challenge is the impact that Oracle DBAs have experienced for multiple releases that optimizer compatibility control has been part of our database world. We have databases living in the past. Databases that are continually growing, but can’t take advantage of the “new clothes” they’ve been offered. Fixes that we can’t take advantage of because we’d need to update the compatibility to do so and the pain of doing so is too risky. Nothing like being a tailor that can only hem and cinch. As the tailors responsible for the future of our charges, there is a point where we need to ensure our voices are heard, to ensure that we are not one of the complacent bystanders, offering stability at the cost of watching the world change around us.
It’s Friday and the last day of my first Summit conference. I’ve wanted to attend this conference for quite a few years, but with my focus on the Oracle side and scheduling date so close to Oracle Open World, I couldn’t justify it before joining Delphix. When I was offered a spot to join the Delphix team and attend, I jumped at the opportunity. Look at this booth and these impressively talented Delphix employees- how could you resist??
The Summit event has around 5000 attendees and is held in Seattle each year around the last week of October. Its held in the Washington Convention Center which is a great venue for a conference. I’d attended KSCOPE here in 2014 and loved it, so was looking forward to enjoying this great location once more.
As I’m just re-engaging the MSSQL side of my brain, I have some catching up to do and there was no place better to do it than at Summit. I was able to dig in deep and attend a few sessions around booth duty and meeting folks I’ve rarely had opportunities outside of Twitter to interact with before this event.
Brent Ozar came by the Delphix booth to say “Hi” and absolutely made my day!
There is always that challenge of learning about what you need to know and what you want to know- this event was no different. I had a list of features and products that I really need to come up to speed on, but with cool technology like Polybase, Analytics with R and performance talks, it could be a bit distracting from my main purpose here. I’ve always been a performance freak and I still find it pulling me from the database knowledge that is important to day-to-day tasks. I know this is why I find such value in Delphix- It frees the DBA from spending any extra time on tasks that I find more mundane so we can spend it on more interesting and challenging areas of database technology.
What I did learn was that many of the companies that are realizing the importance of virtualization for on-premise and in the cloud, aren’t coming close to Delphix in features of ensuring they have the basics down first. You can’t just talk the talk, you have to walk the walk, too. I’m proud of Delphix and what we’ve accomplished- that we don’t say we can do something we can’t and continue on the path to be Azure compliant for 2017. Azure was everywhere this year at Summit and will continue to be a major push for Microsoft.
Another important recognition at Summit was the percentage of women attendees, the women in technology program at Summit, along with the support of the everyone at Summit of the WIT program.
The WIT luncheon and happy hour was on Thursday. In support of this day, over 100 men showed up in kilts. It may seem like a small gesture, but it shows in how the men and women interact at this conference and the contribution of everyone at the event. There is a lot more collaboration and the community is on average, much more interactive and involved than I’ve experienced at any Oracle event. It’s not to say that Oracle is doing it wrong, it’s just to say that the SQL Server community is much farther ahead. They refer to their SQL Family and they take it seriously in a good way.
Due to all of this, I was given the opportunity of a PassTV interview, met a number of incredible experts in the MSSQL community that I’ve only previously known on social media and appreciate being embraced.
I want to thank my great cohorts from Delphix who manned the booth with me at Summit. We had incredible crowds that kept us so busy answering questions, doing demos and talking about how virtualized databases can increase productivity and revenue. Sacha, Venkat, Jonathan, Dante and Jenny and Richie- you all ROCKED!
Thanks to everyone for making my first SQL Pass Summit conference AWESOME!!
I’ll be attending my very first Pass Summit next week and I’m really psyched! Delphix is a major sponsor at the event, so I’ll get to be at the booth and will be rocking some amazing new Delphix attire, (thank you to my boss for understanding that a goth girl has to keep up appearances and letting me order my own Delphix ware.)
Its an amazing event and for those of you who are my Oracle peeps, wondering what Summit is, think Oracle Open World for the Microsoft SQL Server expert folks.
I was a strong proponent of immersing in different database and technology platforms early on. You never know when the knowledge you gain in an area that you never thought would be useful ends up saving the day.
Yesterday this philosophy came into play again. A couple of folks were having some challenges with a testing scenario of a new MSSQL environment and asked for other Delphix experts for assistance via Slack. I am known for multi-tasking, so I thought, while I was doing some research and building out content, I would just have the shared session going in the background while I continued to work. As soon as I logged into the web session, the guys welcomed me and said, “Maybe Kellyn knows what’s causing this error…”
Me- “Whoops, guess I gotta pay attention…”
SQL Server, for the broader database world, has always been, unlike Oracle, multi-tenant. This translates to a historical architecture that has a server level login AND a user database level username. The Login ID, (login name) is linked to a userID, (and such a user name) in the (aka schema) user database. Oracle is starting to migrate to similar architecture with Database version 12c, moving more away from schemas within a database and towards multi-tenant, where the pluggable database, (PDB) serves as the schema.
I didn’t recognize the initial error that arose from the clone process, but that’s not uncommon, as error messages can change with versions and with proprietary code. I also have worked very little to none on MSSQL 2014. When the guys clicked in Management Studio on the target user database and were told they didn’t have access, it wasn’t lost on anyone to look at the login and user mapping to show the login didn’t have a mapping to a username for this particular user database. What was challenging them, was that when they tried to add the mapping, (username) for the login to the database, it stated the username already existed and failed.
This is where “old school” MSSQL knowledge came into play. Most of my database knowledge for SQL Server is from versions 6.5 through 2008. Along with a lot of recovery and migrations, I also performed a process very similar to the option in Oracle to plug or unplug a PDB, in MSSQL terminology referred to as “attach and detach” of a MSSQL database. You could then easily move the database to another SQL Server, but you very often would have what is called “orphaned users.” This is where the login ID’s weren’t connected to the user names in the database and needed to be resynchronized correctly. To perform this task, you could dynamically create a script to pull the logins if they didn’t already exist, run it against the “target” SQL Server and then create one that ran a procedure to synchronize the logins and user names.
Use <user_dbname> go exec sp_change_users_login 'Update_One','<loginname>','<username>' go
For the problem that was experienced above, it was simply the delphix user that wasn’t linked post restoration due to some privileges and we once we ran this command against the target database all was good again.
This wasn’t the long term solution, but pointed to where the break was in the clone design and that can now be addressed, but it shows that experience, no matter how benign having it may seem, can come in handy later on in our careers.
I am looking forward to learning a bunch of NEW and AWESOME MSSQL knowledge to take back to Delphix at Pass Summit this next week, as well as meeting up with some great folks from the SQL Family.
See you next week in Seattle!