Thursday, 9 April 2015

Software Architecture - Examples

In this post I present how the same application can be created if it is done without any architecture, with an architecture completely bound to a technology and its limitations and with an architecture that I consider to be "unbound" from technologies and their limitations.
On the article Software Architecture I said that I usually start the architecture of applications by thinking only on how to solve a problem using the idea of a technology, not considering any limitations of actual implementations. Only later I try to chose a technology and in many cases decide to write my own because the existing ones don't accomodate my needs.
I know it is hard to get that idea without examples, so in this post I will try to show application designs ranging from the complete lack of architecture to an architecture bound to a technology and also an architecture that I consider unbound to technologies and their limitations.

Fictional Purpose

Create a simple web application that lists categories (with any level of sub-categories) and products. Such an application will not edit anything and there's no e-commerce at all. The data will be entered using another application and the sells will be done by phone or by some other means.
It is actually planned that on the future there will be an editor application and even e-commerce on the site but that depends on the success of the actual code. So, it is not needed to write those other applications for now but it is good to be prepared to receive them.

No Architecture - "Being Too Agile"

I don't want to criticize the Agile methodology as I think it has many valid points. Unfortunately, many developers justify the completely lack of architecture as being "Agile", and that's why I am using that "Being Too Agile" on this topic.
This "too agile" may happen because there's no discussion at all and developers simply start doing things or because there are discussions focused on the wrong things before starting, maybe more focused on how to name things (like private members and database tables) instead of really focusing on code reuse and the evolution of the application. That is:
  • A database is immediately created using the database the team is most used to (for example, SQL Server). Two tables are created (Category and Product, independently if they have a different name by their standards). At this moment, the team is minimalist, so there's only the Id, IdParentCategory (that can be null) and the Name for the Categories and Id, IdCategory, Name and Description for products. No other tables or columns. Obviously, some fake data is filled;
  • The web application is created with a single form. To "reuse" code, a DBHelper class (which is completely bound to SQL Server and has only two methods: ExecuteNonQuery and GetDataTable) is copied from another project and the actual connection string, which happens to be hard-coded in it, is changed to the new database. In the main form there's a tree view for the categories and list box for the products. The base categories are loaded on the constructor (using the DBHelper class to return a DataTable and then iterating it to populate the tree, in the format "Id - Name") and when clicking in any category, the category Id is extracted from the item's text and used to generate the query of products and sub-categories, by concatenating the id to the base queries (hard-coded on the events).
As you can imagine, all the code is in the form's code behind and the only class added to the project is the DBHelper class.
To me, this is the complete lack of architecture but some people (in particular managers that took only one or two programming lessons but believe they know much more than they actually do) can see only good points, like:
  • Everything that happens on the form can be found by looking at the form's code behind file. There's no need to navigate many "layers of files", there's no complexity to find the implementation of an interface or anything like that;
  • As the DBHelper is built for a specific database, it avoids the "slow" virtual calls;
  • Changing the code to use another database is possible, it is enough to change the DBHelper class source code;
  • The application is up and running pretty fast;
  • Junior developers can maintain the application.
The bad points? Probably none to someone who completely agrees with all the "good points" I just presented. If that's not the case, then there are lots of problems, but I believe they will become obvious when reading the rest of this post.

Code Reuse... Or Not

The customer sees the application. Obviously he requests changes to the UI, layout etc but I will ignore it here. The application is running and there's no request to change its code or architecture.
Yet, the customer thinks that it would be great to have a native application too, so things can run faster and outside the browser. Let's not discuss if browsers are faster now or that the performance difference will not be noticed. Let's simply accept that a native application will be created.
Can we reuse any of the existing code?
And if we don't consider copying the DBHelper to a new project or copying the hard coded queries from the current web application to the new application, then there's nothing to reuse. It's like we can use the current project as the "inspiration" for the new application by looking at its code to take the parts we want but there's no direct reuse, like importing a library or something.
Considering this is a really small application, it would be OK to do a copy, but now is the moment that the future changes are probably taken into account, and it seems better to change only one place than change two places every time. One of the first considered changes is "what if in the future we should not show some categories (like empty ones) or products (that have a deleted flag, an expiration date or similar)?"
As the developers don't see any way to reuse the UI, they want to reuse the database queries and they think about these two solutions:
  • Create a Queries unit that will contain all the queries of the application, and the same file will be shared by both applications (not copied);
  • Create Stored Procedures to do the job in the database.
And I can tell that in most cases the second solution will be used. There are actually many arguments to go in that direction:
  • Changes to the database will not require change to the applications or recompilation of the applications;
  • Stored Procedures are stored in an optimized way inside the database and some even further optimize themselves according to the use, so they are definitely faster than executing different queries from the code;
  • It is said that stored procedures can avoid SQL Injection because Stored Procedures are parameterized, but this is a half-truth (the procedure itself is parameterized, but code that concatenates strings to do the EXEC PROC is still suscetible to SQL Injection);
  • Independently of the half truth, it is true that direct access to the tables can be forbidden, so a DBA can protect the database by forbidding direct access to tables, avoiding deletes and updates to happen if there are no procedures for such actions and also forbidding a query to take too long because a WHERE clause was not used in a SELECT;
  • There's a standard on the code: To call anything on the database there's an EXEC PROC followed by a procedure name and all the parameter values it needs.
So, this is the direction the team takes.
In this case, it happened early in time as the application only has 3 queries. That means that 3 stored procedures are created and the web application is changed to use the stored procedures instead of doing the direct SELECT commands. It would be terrible if this change happened after having 50+ queries.
For now, consider that the database parameters are not used, so the code is suscetible to SQL Injection but there's nothing an SQL Injection can do to corrupt the database at this moment, as only the stored procedures are accessible as read-only and all the data is public.
With this database change, it is now possible to create the native application and "reuse some code". Even the "Id - Name" formatting for the category is done by the stored procedure now and the code of the application doesn't extract the ID anymore (I will not even explore what will happen if the ID can't be displayed). The stored procedures receive the entire Category text and extract the ID. The developers need to recreate the UI, but the "real logic" of the queries is reused. The only code on the application is to add the tree view item's Text at the end of an EXEC PROC call and to read the results and create new tree view items or populate the list box.

Connectivity Problem

To be honest, most DBAs would never allow the database to be externally visible, but let's say the team is not really working with a DBA. They are simply "solving problems fast" and the fastest thing to do was to allow the native application to directly connect to the database.
It worked on some tests but there are two main problems when used in production:
  1. It doesn't work if a proxy server is required to connect to the internet;
  2. If too many native applications are connected, there's an excessive load on the database server, even if most of the connections are inactive.
To the first problem there's no easy solution, as the database connections simply can't pass through a proxy server.
The root cause of the second problem is that the connection string used by the DBHelper class is using a connection pool. It doesn't matter that the pool only has one connection. Every native application is keeping one connection alive. The connection pool is great on the web application, as the same connections are used independently on the client, but they are terrible on the native application that runs on the clients computers.
To solve this issue the connection pool is disabled for the native application. As the DBHelper is a copy, the change doesn't affect the web application. But now the native application becomes slow... in many cases much slower than using the web application as it loses more time connecting and disconnecting from the database than reading some data.

Web Service

It seems that the only valid solution to keep the database server free and to pass through proxies is to use a web service. The web service can actually live in the same server of the web application, sharing the connection pool.
So, the decision is to create a web service to represent all the methods they are using from the DBHelper:
interface IWebService
{
  string GetDataTableAsXml(string commandText);
}
Yeah, they are using only one method. A web-service like this is clearly not how web-services are supposed to work but this is a work-around to allow the code to continue using a DBHelper class and doing those EXEC PROC calls.
So, the actual job is to create a web-service that executes the received command and converts the data-table to string, and change the native application's DBHelper class to use the service and convert the string result back to a DataTable. The application itself will continue to use the DBHelper class and use the same SQLs, so this is the smallest change possible for the application right now.
It works. Many developers would be afraid to deal with that kind of "architecture" (or will try to kill someone) but it works. Bad things will happen if we can't show the ID on the category, but that's not a requirement right now.

Starting Differently

If you think that the previous solution was terrible, well, I completely agree with you. Yet it shows a problem that happens frequently:
  1. An application is created with a bad or inexisting architecture;
  2. All the changes that come later are required to be minimal (usually by time constraints or because the "architects" don't accept their "architecture" is completely flawed), being more work-arounds than a design/architecture change.
But what would happen if they started differently? What would happen if, since the beginning, it was said that a stateless web-service was required, capable of listing categories, sub-categories and products?
I can see that a web-service interface like this would be written:
interface IWebService
{
  // Gets all the categories from the given path.
  // To get the base categories, pass null or an empty string.
  // Returns only the sub-category names. 
  // To create the full path, use the previous path + "/" + categoryName.
  string[] GetCategories(string path);

  ProductInfo[] GetProducts(string path);
}
And the ProductInfo will have all the info for the product that's not already present on the call. That is, it will not have the category path, as it comes on the request, but it will have Name, Description and any new column that may be necessary to present the product.
I am not going too far with the changes, so keep the idea of that DBHelper class and string concatenation when dealing with the database.

The applications

In this case, the Web Application can either use the web-service as an external service or it can invoke the web-service implementation directly, after all they are on the same server.
The client application would use the service from the start and the problems related to direct database access would never exist.
Considering there's only the service accessing the database, it is possible that the stored procedures are never created (remember that the stored procedures were created in the other situation to share some code and in this case the service is shared). So, all the queries can be part of the web-service itself.
Of course there's a big difference from before: There's no IDs being send from the client to the server anymore. The paths are actually created combining the selected category with all its parents (code that is probably going to be copied in both applications, as it is not that big). This also means that the service will probably split the path to find each category ID, executing much more queries when a 10th level category is used.
If such navigation of the categories becomes a problem, the future optimizations will probably include:
  • Caching results on the service. At least while the database is read-only, any kind of cache that avoids new database calls would be great;
  • Creating an extra table with the full paths and the category ID. So, it would be possible to do a single query for products or sub-categories with an inner join to that new table, which can use an equals comparison for the full path;
  • Simply putting the full path as a new field in the product and category tables. This is probably the solution that uses most database space, but it avoids a new table and the joins.
The shared problem with the first case is that by using that DBHelper and string concatenation it is still possible to suffer SQL Injection.
As my purpose is not to discuss about SQL Injection but different architectures, I want you to think about this:
Can you see how the data related stuff was written in a completely different manner simply because we started with a stateless web-service?
I am not saying the applications are much better now. They still have the events written in the code behind, using foreach over the results to create new tree view items, using the data from the UI components to get the category paths and completely ignoring design patterns like MVVM or MVC.
Yet, the decision on how to write the database is different. The queries executed are probably going to be different. And the applications don't have any database query anymore (even if EXEC PROC doesn't show what happens internally, it is a database query). Actually, it is even possible to write a web-service that never access a database, using a XML file or a completely different thing and the applications don't need to change.

The Basic Object-Oriended Approach

Both solutions presented until now are completely different from the most basic object-oriented (OO) approach.
Anyone who knows the basic from object-oriented programming will naturally think about two classes: Category and Product. I am not talking about database tables. I am talking about classes.
The Category would probably have these members:
static SomeCollectionType<Category> BaseCategories { get; }
static Category GetBaseCategoryByName(string name);

Category Parent { get; } // Can be null
string Name { get; } // Can't be null

SomeCollectionType<Category> SubCategories { get; }
SomeCollectionType<Product> Products { get; }
// I will soon discuss the SomeCollectionType

Category GetSubCategoryByName(string name);
Product GetProductByName(string name);
And the Product will probably have these properties:
Category Category { get; }
string Name { get; }
string Description { get; }
// Any other property that seems necessary, like Price, Picture etc.
Thinking about classes to represent Category and Product but also forgetting about a better architecture, developers could simply implement the SubCategories and the Products properties to call the DBHelper.GetDataTable method and make these classes completely bound to the database. There's a big chance they will also expose an Id property.
Also, the use of real properties for the collections or methods like GetSubCategories() and GetProducts() and the result types of array, IEnumerable or something different will greatly depend on how well developers know the Object Oriented principles, how much they will tie the result types with the actual implementation and also how well they follow the .NET guidelines.
  • The use of methods is recommented to let it clear that the action may take time, but this is kind of binding it to an implementation (what if all the items are in memory already?);
  • The use of properties is the most common way to represent child items in most cases, but it lacks the information that a slow database call may happen;
  • I am not covering it in this post, but maybe a Task<SomeCollectionType<T>> should be used as the result type to support asynchronous implementations;
  • Some developers may return lists simply because when reading the database they don't know how many records are there and putting things to a list is their "natural" choice. Maybe they don't know any good practice telling not to return modifiable collections, maybe they don't care as a new list is created each time, so there's no problem if the receiver changes the result;
  • Some developers may return arrays because they are following some pattern like the Reflection methods. People can insert new items to a list, not to an array (but they forget they aren't read-only and items can be replaced);
  • Other developers will use a ReadOnlyCollection because they know the result is not expected to be modified;
  • And others will return IEnumerable either because they learned it is the right thing to do, either because they want to use yield return and avoid pre-loading all the records.

No ORM

Notice that here I am not talking about using an ORM. An ORM would probably generate a similar set of classes, but that would be required by the ORM to map to the database and that's not my focus. I am simply talking about the basic OO idea of having objects that can contain data and behavior.
It would be possible to put these 2 classes into a shared DLL and end-up doing a web application and a native application that are pretty similar to the first case. That is, the native application would still have direct access to the database by using a DLL that has direct access to the database.
The biggest difference is that we will always see Category and Product as objects. Then the objects will do the database job. This would probably change how things are put into the Tree View, so either the objects are put into the Tree View directly and a very base data-template is used to show the Name, or they are somehow stored in a property like Tag to be accessed later without extracting paths or ids from the tree view items.
It would also be possible to have a case similar to the second one, creating a web-service on top of these classes and writing the applications on top of the service. But then the OO approach will very likely only exist inside the service. For the two applications things will stay as a stateless service, not as an Object-Oriented approach.

Object Oriented without Limitations

Wouldn't it be better if things started with the object oriented approach, allowing both applications to deal with Category and Product objects, but without the problems?
That is, the web application could use the objects directly and those objects will access the database. The native application will also be coded as if it was using those objects directly but communicating to a web-service.
So, is this possible?
And the answer is yes. And this is what I mean when I talk about creating an application architecture without considering limitations. I don't need to think on stateless objects. Being stateless is a communication limitation, not part of the application architecture.
So, the only "constraint" I use is that to support all the different scenarios I must start things with interfaces, not classes.
That is, I will have an IProduct and an ICategory. These interfaces could look like this:
public interface ICategory
{
  ICategory Parent { get; }
  string Name { get; }

  IEnumerable SubCategories { get; }
  IEnumerable Products { get; }

  ICategory GetSubCategoryByName(string name);
  IProduct GetProductByName(string name);
}
public interface IProduct
{
  ICategory Category { get; }
  string Name { get; }
  string Description { get; }
  // Any other property appropriate for a product here.
}
As we can't have static methods on interface, we will need an "entry point" to get the base categories. This can be another interface:
public interface IBaseCategories
{
  IEnumerable Categories { get; }

  ICategory GetCategoryByName(string name);
}
Having these interfaces we could write the two applications without any dependency on specific implementations. Those interfaces can be implemented in completely different manners, so we can have implementations that load things from the database, that load things from XML, by using a service and why not a test implementation that simply instantiates two categories with two products each directly? This is what will probably happen when testing the applications for the first time. Instead of dealing with an actual database connection, we simply make things work on top of the interface with a fake implementation.
Of course, at some moment we will need to write an implementation that uses the database (which can have Id properties too) and we could end-up making the bad choice of using that implementation on the native application. But replacing that implementation with one that uses a web-service will only require a one-line change to instantiate a different IBaseCategories implementation (and if done really correctly could avoid recompiling the application, but I am ignoring that part of the architecture for now).
So, up to this moment, the most basic architecture, that I consider to be free of implementation problems is:
  1. Think about all the classes you need, like you will do when writing an UML diagram and, to guarantee that you are not bound to any implementation detail, write everything as interfaces;
  2. Actually, there's no item two. As long as the interfaces represent the right behavior and relationships between the objects, everything is OK at the architecture level. The applications can already be written on top of the interfaces.

Moving problems forward?

Having the interfaces first means that we can have any implementation. But we will need to write the implementations, right? And those implementations can have all sort of problems like they had before. So, aren't we simply moving the problems forward?
The answer to this question is something between yes and no. If we blindly implement the interfaces we can surely have an implementation with all the problems we had in the other scenarios and to fix them we may need to completely rewrite the it. Yet such a "complete rewrite" of the implementation doesn't require a change to the applications.
In fact, it is even possible to request that another team writes the implementation of these interfaces and they aren't required to have the applications at all as long as they have the interfaces, so it is guaranteed that the rework is going to be smaller.

Stateful and Stateless - The False Assumption

"We can replace implementations but we can't use a stateless web-service. These interfaces are stateful. Everybody knows that stateful services don't scale well."
This is probably the killer argument to avoid starting things with the more Object Oriented interfaces. And this is actually false argument.
It is true that if we use the default .NET remoting the products and categories referenced by a client need to be kept alive on the server for as long the the client can use them or else new requests to use those objects will fail. Worse than that, it is the same computer that must answer the requests as all the "object ids" known by the client are server's in-memory object ids.
If we use WCF we simply can't expose these interfaces directly as services because they aren't stateless.
Yet, these are framework-specific limitations. Instead of using in-memory IDs the framework could very well be sending the actual database IDs or even the paths as the information, being capable of reloading the objects if needed and also allowing different servers to answer new requests.
Want a proof of this?
It is possible to create a stateless web-service and implement it to call these "stateful interfaces". It is possible to create an implementation of the object oriented (stateful) interfaces that store the paths and use that stateless service for the calls.
That is, the application can be using the real implementation directly or it can using objects that hold paths and redirect the requests to a stateless webservice, with in turn is implemented to use the stateful objects to do the job.
"OK, it works but it is a lot more work to make it run properly and all the extra work makes it a bad architecture."
Yes, it is a lot of work... if it is done by hand. A better framework will do that transparently or, event better, will simply work differently and avoid the problems altogether.
My purpose here is not to say that we should avoid stateless services or that we must always lose time creating our own frameworks. My purpose is to show the difference between applications written without any architecture in mind, applications written with an architecture in mind, but having the architecture shaped by specific technology/framework limitations and applications that have the architecture made before considering technology limitations.

Risks

An architecture unbound from technology limitations is not always a good thing. Some limitations can't be avoided and others would require so much effort to overcome that it is better to accept them. I can say that at this moment the biggest limitation I see when designing any interface is the sync/async dilemma. An interface shouldn't expose implementation details but being synchronous or asynchronous is an implementation detail that affects the signature of the methods.
To me, that's the kind of technology limitation that we must consider when doing the architecture. If we consider only the most versatible design, without considering performance, it is probably better to make all interface signatures asynchronous as it will support all the cases. It is always possible to give a synchronous result when implementing a method with an asynchronous signature. The opposite is not always possible.
Yet, making absolutely all interfaces asynchronous is a performance killer. So, it becomes a matter of choice and the expected uses/implementations of the interfaces and, in some cases, it is even valid to have both synchronous and asynchronous methods that do the same. I hope this kind of problem disappear in the future.

MVVM, MVC, ORMs and everything else

Up to this moment I didn't solve many problems. In all scenarios there's code directly on the form's code behind and I clearly left all the implementations that access the database susceptible to SQL injection. So, I am doing a terrible architecture, don't you think?
To be honest, I left all those details untouched on purpose. At this moment we have three main areas: UI, the job to be done and the abstraction that let the other two talk. In some sense we can say that this is a kind of MVVM or MVC, but it is not exactly the same. In MVC and MVVM, all the layers are implementation layers. There's no real abstraction.
In the latest solution I consider the UI and the job to be done as the implementation and the abstraction (the interfaces) as the real architecture. That is, it doesn't matter if you use MVVM correctly. If your Model is bound to SQL Server and nothing else, it would be a problem to make things work through a web-service or similar. If you have the right abstraction first, then it is pretty easy to do that change.
Yet, it doesn't mean that you should avoid MVVM or using an ORM. In the end we always need an working application (or two, as the web application is not the native application), and having a good architecture is only the start. When going to implement things, if an ORM will help the team write easier to read queries and avoid SQL Injection, they should go for it. If the code behind is a problem because the designers don't know what to do with it, then go for MVVM. Only remember that those are part of the implementation. Maybe you can consider them sub-architectures as they will greatly influence the code that's going to be written, but the main architecture is built on the purpose of the application. MVVM, MVC and ORMs exist independently of the applications and should not be considered the architecture on their own.


Ref From 
http://www.codeproject.com/Articles/889468/Software-Architecture-Examples

No comments:

Post a Comment

What should you required to learn machine learning

  To learn machine learning, you will need to acquire a combination of technical skills and domain knowledge. Here are some of the things yo...