Print Header

My Favorite Blogs

Recent Articles & Presentations

Contact Blue Fish

Blue Fish Development Group
701 Brazos St. #700
Austin, TX 78701
(512) 469-9300

Out of the Blue - Michael Trafton's Blog

New Article: Architecting Endeca for Large Data Deployments

December 28, 2007

In our Information Access practice, we often find ourselves implementing Endeca to help our clients find documents and other information easily and intuitively. Some of our engagements involve massive amounts of data, and it can be challenging to architect a reliable, scalable architecture when you are dealing with millions of records and gigabytes of data.

Dan Burton, one of our Solution Architects, just published an article full of tips on how to deploy Endeca in environments that have huge datasets. Learn more by reading Architecting Endeca for Large Data Deployments.

New Article: Technical Challenges Faced During Content Migrations

December 27, 2007

Over the years at Blue Fish, we have helped lots of clients migrate documents and other data from file systems, databases, and legacy systems into the content management systems we help them design, develop, and deploy. We call these efforts “Content Migrations”, and we have an entire Content Migrations practice area devoted to helping our clients perform these migrations quickly and effectively.

“Why does Blue Fish need an entire practice focused on Content Migrations?” you may ask. After all, the concept seems pretty straightforward. But these migrations can be a lot harder than than they look. I can’t tell you how many times we’ve had clients budget a week or two for their migration only for it to remain incomplete months later.

Pete Nevin, one of the consultants in our Content Migrations practice, just published an article detailing some of the common pitfalls we see companies make when they try to tackle a migration. The article is called Technical Challenges Faced During Content Migrations, and it discusses the typical approach most people use when performing a content migration along with several things to watch out for in each step of the process.

New Article: Integrating Web Publisher and WordPress

December 18, 2007

As you might imagine, here at Blue Fish, we “eat our own dogfood” by authoring and maintaining the Blue Fish Web site using EMC Documentum Web Publisher and Blue Fish Navigation Manager. This allows us to test the latest patches and releases in a real-world environment before they reach our clients’ production systems.

I’m sure you have also figured out that this blog is powered by some off-the-shelf blog software - in this case, it’s powered by WordPress, one of the most popular blogging platforms out there.

What you may not have known is that all the Enterprise Content Management articles that we publish here on the Blue Fish Web site are also powered by WordPress. We do this so that our readers can easily give us feedback by rating the articles and commenting on them.

Since we use Web Publisher to write our articles and WordPress to publish them, we needed to integrate the two platforms.

Marc Perlman just published an article on how we did this. It called Integrating Web Publisher and WordPress, and it’s a interesting read.

Why Google’s Search Appliance is Un-Googley

November 2, 2007

I love Google. I love almost everything about them. I love their motto (Do No Evil), I love their employee-focused work environment, and of course I love their web/image/news/video search tools. I also love most of their other products - Google Maps was ground breaking when it was introduced, Gmail has taken web-based email to the next level, and I use Google Reader everyday as my news and blog aggregator. I’m a devoted Google-lover because everything they do is so Googley.

But one of Google’s products stands out as very Un-Googley. You may not know it, but Google has packaged up their search technology into a standalone appliance that it sells for companies to use inside their firewall as an Enterprise Search Engine. Enterprise Search Engines are used to find corporate information that has been stored in various nooks and crannies of file systems, internal web sites, and content management repositories. Although Google has the best Internet search engine in the world, it has one of the worst Enterprise Search Engines. To find out why, let’s start from the beginning.

Back in the 90s, there were a lot of researchers thinking about search technology. Information on the Internet was growing at an astounding rate (it still is) and was driving the market for search tools through the roof. Back then, the main problem researchers were trying to solve was the “query expression” problem - that is, they wanted to make it easier for users to tell the computer what they were looking for. A lot of effort was put into natural language processing, and out of this research came search engines like Ask Jeeves, where the user could literally type in a question like “What store has the least expensive shoes?”

When Google came along, it blew these search engines out of the water. It turns out that all these researchers were trying to solve the wrong problem. Google showed us that the hard problem in search technology was not in expressing the query, it was in expressing the results - specifically, in placing the most relevant results at the top of the list.

Prior to Google, search engines determined relevancy by how many times the search term appeared in the web page and by where it appeared. If the word “Penguin” appeared in the title of the page and also appeared several times in the body of the page, that page was deemed to be more relevant to penguins than a page that only contained the word once. This led unscrupulous web masters to game the system, filling their pages with the most popular search terms repeated over and over, often in a tiny font the same color as the background. It wasn’t uncommon for these unscrupulous web sites to be near the top of a search engine’s result list.

Google uses a different approach for determining if a web page is relevant. Each time a web page links to another web page, Google treats that link as a “vote”. Web pages that get the most votes are considered to be the most relevant. This simple algorithm, called PageRank, uses a simple premise - if your page is the most popular (a lot of other pages link to it), it is likely to be the most relevant.

This approach works amazingly well for Internet searches, and I almost never have to look at the second page of results when I search for something on Google.

The problem with Google’s Search Appliance is that although the PageRank algorithm works great for web pages, it doesn’t work at all for Word documents, Excel spreadsheets, PowerPoint presentations and the like, because these documents don’t link to each other the way web pages do. And the information that companies most want to search for is contained within office documents like these. Google’s inside-the-firewall search appliances aren’t any better than Internet search engines of the early 1990s.

The biggest issue with all of this is that Google’s search appliances diminish their brand. When I use Google inside my firewall, I have to scroll through pages of results to find the document I’m looking for. This just doesn’t measure up to the expectations that I have for a Google product - it’s just not Googley enough.

How to Build a “Best Place to Work”

August 16, 2007

A few weeks ago, Blue Fish was selected by the Austin Business Journal as one of the best places to work in Austin. But it hasn’t always been that way. Six months after I started the company, more than half of my employees resigned. That was a really low point for me, but we’ve come a long way since then.

Back when I started Blue Fish, my main goal was to build “a company where the people I want to work with want to work”. I didn’t quite know how to achieve this, but I did know that with the amount of time I spend at the office, I want to spend that time with people that I enjoy being around.

And of course I blew it. In my haste to get Blue Fish off the ground, I didn’t spend much time interviewing at all. I hired the first people I found that had the skills I needed, and I didn’t think much more about it. Then I learned the hard way that a random collection of people will produce random results. My team didn’t work well together, they weren’t all on the same page, and they got fed up and quit. I had to start over from scratch. And I needed to pay a lot more attention to who I was hiring.

Fast forward to the present. Blue Fish has the best team I’ve ever worked with, and we’re all pulling in the same direction. To get here, I had to answer two important questions: Who do I want to work with, and what do these people want in their employer?

The answer to the first question is easy. I want to work with people who are:

  • “Bad-Asses” - They are among the best at what they do.
  • Multi-Disciplinary - I’ve always been drawn to people with a variety of passions.
  • Friendly - Life is too short to spend it with jerks.
  • Client-Focused - They put the client’s needs above their own.

The answer to the second question is a lot harder. Eventually, I started surveying the employees at Blue Fish to understand this better. Here’s what I’ve found out - the people I want to work with want the following:

  • They want to work with smart and talented people.
  • They want a casual, fun work environment.
  • They want to have an impact on the business.
  • They want to produce a quality product.

So how have we built a company that attracts the type of folks we want to hire? I like to describe our culture by saying that although we do serious work, we don’t take ourselves too seriously.

First of all, we know how to have fun. We’ve built an employee lounge, complete with sofas, video games, a full bar and a kegerator. We also throw happy hours to welcome new team members, and we go offsite to celebrate our successes - earlier this year, we took the entire company on a ski trip vacation.

Second, we respect, trust, and empower our employees. We have an open book policy, and once a month, the executive team shares the company financials with all the employees. Employees also vote on the corporate initiatives that are most important to them, and these are the initiatives that we tackle throughout the year. In 2007, for example, we’re working on improving our estimation practices, developing a training curriculum for new hires, and increasing cross-project knowledge sharing.

Third, we focus on client success. We’ve made a commitment to our clients and our employees - we won’t take on a project unless we believe we can be successful. We survey our clients to understand what we’re doing right and where we can improve. And we place a premium on quality so that our employees can be proud of the solutions they deliver.

When I look back at how far we’ve come, I’m incredibly proud. But we can’t rest on our laurels - Blue Fish is in growth mode, and keeping it a great place to work gets more challenging the bigger we get and the faster we grow. Now that we’re an official “Best Place to Work,” the bar has been raised. But if we stay focused on our culture and keep building a great team, I think we have a shot at winning again next year.

Article: Introduction to Documentum’s Content Server OEM Edition

July 10, 2007

Earlier this year, EMC released a version of their Documentum Content Server that is targeted at Independent Software Vendors (ISVs). Documentum’s hope is that other software companies will embed the Content Server into their products, similar to the way that Documentum embeds products such as FAST’s search engine. To make it easier for ISVs to incorporate the Documentum platform in their products, EMC created the Documentum Content Server OEM Edition. Thomas Hughes, one of the consultants here at Blue Fish, wrote an article that gives an overview of the OEM Edition. Check it out at Introduction to Documentum’s Content Server OEM Edition.

Article: Testing DFC Offline Using Mock Objects

July 10, 2007

One of the most challenging aspects of developing with a technology like Documentum is testing your application to ensure that it behaves properly. A few years ago, we created a unit testing framework that helps our developers write unit tests against live Documentum repositories. We’ve written an article about this approach and have given two presentations about it as well.

Lately, we’ve been using a different method for unit testing our DFC code. It uses mock objects so that a live connection to a Documentum repository is no longer necessary. Steve McMichael, one of the developers here at Blue Fish, pioneered this approach, and he wrote an interesting article about it. Check it out at Testing DFC Offline Using Mock Objects.

Presentations from EMC World

June 20, 2007

Earlier this year, Blue Fish gave three presentations at EMC World 2007, EMC’s user and developer conference. All three presentations are now available in the Articles section of our web site.

Final Day of EMC World 2007

May 24, 2007

Today was the final day of EMC World, and I was able to catch one session this morning before my flight left. The session was an overview of the upcoming D6 release of WebTop, Documentum’s multi-purpose document management application.

This release of WebTop seems to be focused on configurability and usability. Here are some of the highlights.

  • In the out-of-the-box experience, WebTop has removed the “streamline” interface. Although you can still turn it on via a configuration setting, the default view is an improved version of the “classic” interface - the one that looks a lot like Windows Explorer, with a folder tree on the left and a table view of the folder’s contents on the right.
  • WebTop will now have Resizable Columns, something that I have been wanting for quite a long time.
  • As I mentioned a couple of days ago, WebTop will still support the ability to multi-select documents. But instead of having to click checkboxes to select documents, the user can now shift-click and control-click just as she would in other Windows applications.
  • To make it easier to perform actions on a document, WebTop now supports a right-click menu and keyboard shortcuts. There’s even support for double-click that will bring up an actions menu, although I didn’t get a good sense of why this is better than right-click. Developers can easily add their own functions to the right-click menu and create custom keyboard shortcuts as well.
  • WebTop now includes auto-complete for various form fields on the properties dialog and other screens.
  • All listing pages, such as the ones that show the contents of folders or categories, now include a starts with panel that lets the user filter the current result set by typing the first few letters of a document name.
  • WebTop now supports one of the most commonly-requested features of all time: setting attributes on multiple documents at the same time. You can also update the permissions or lifecycle state of multiple documents simultaneously.
  • You can now set a document’s lifecycle state on the properties screen and right-click to promote the document.
  • WebTop now support the ability to export any listing page as a comma-separated CSV file so that it can be easily opened in Excel. The export will include the entire result set, not just the documents on the current screen.
  • Some improvements have been made to the Drag and Drop functionality. You can now drag documents together to create relationships between them, add a rendition by dragging a file onto an existing document, and check in from file by dragging a file onto a document you had previously checked out.
  • In previous versions of Documentum, a user had to be a full-fledged system administrator in order to have access to any of the administrative functionality. In D6, you will be able to scope users with specific administrative features, allowing you to delegate tasks such as user and group management.
  • One of the biggest changes to WebTop is the addition of Presets that allow power users to configure the user interface for various roles. Using a GUI, you will be able to (for example) limit the object types, formats, ACLs, lifecycles, and templates that are seen by a user when she imports a document. You will also be able to configure an object type and default attributes that will be applied when a document in imported into a certain folder.

Day Two from EMC World 2007

May 22, 2007

Tonight, Blue Fish held our annual client appreciation dinner. This is our opportunity to say thanks to all of our clients who have supported us over the years. This year, the site of the event was the Shark’s Underwater Grill at SeaWorld. We ate in a giant aquarium, surrounded by sharks, stingrays, and other fish. It was quite a sight. Overall, more than 60 of our clients joined us, and it was a lot of fun.

Earlier today, I spent some time learning about the architectural improvements in the upcoming D6 release of Documentum. Below are some of the things that caught my eye today.

Documentum Foundation Services

  • EMC is releasing a set of out-of-the-box web services called Documentum Foundation Services (DFS). DFS provides a higher-level interface for interacting with the content server, encapsulating several DFC calls into a single DFS method. Most of the services you would need to do basic content management will ship in D6, and EMC has committed to eventually providing DFS services for 100% of the Documentum platform, including services for Rich Media and Web Content Management.
  • If you are calling DFS locally (from within the same application server), you’ll be able to use a DFS Client Library to call DFS directly, without the performance penalty of marshalling and unmarshalling XML/SOAP calls. If you later decide to deploy your application remotely, you can flip a switch and your application will start using the web services interface to DFS.

Java DMCL

As I mentioned yesterday, there is now a Java DMCL. I learned a bit more today.

  • The old DMCL was optimized for client-server environments where a thick client acquired a session and kept it for a long time. Using this DMCL in a web environment was very inefficient, since application servers typically keep a session alive only for the length of the request (less than a second). The new Java DMCL is optimized for a web environment.
  • The Java DMCL has eliminated the need for a JNI bridge to allow DFC to communicate with the old DMCL. This has actually improved performance by 25% for single-user applications and by 40% for multi-user applications. There is also a reduction in memory usage, allowing your application server to process more concurrent connections than it could previously.