Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In our initial version, and as mentioned before, we are doing a full database reset. In other words, every time a new test scenario is executed, we delete the database, recreate it, install ODS and install AMT. This process is actually consuming too much time. The goal with this extension is to see the possibility of improving this performance reviewing 2 3 possible options:

  1. Instead of doing this full database reset, what about doing a truncate on all tables?
  2. Instead of doing this full database reset, implement Respawn. More information about this tool here

...

  1. Determine if we need to start with an empty database for each of our test cases.

In relation to the first 2 options, the main idea is to compare both approaches and see if one of them has a better performance.

...

So, as you can see, the performance is super similar. Meaning that, whether we implement the All tables truncate or Respawn, the performance should not be a variable in the equation. Rather we should consider other reasons to implement one or the other. 

Additionally I executed the unit test associated to DateDim, SQL Server, just Data Standard 3.2. In other words, the exact same tests (the 51 unit tests). And it took 42 seconds. Essentially the same amount of time.

Conclusion

We In relation to the first and second options, we can implement either the all tables truncate or Respawn. Whether we are going to implement one or the other is still an open question. But I think the approach we take should not be based on the performance. 

Branches

...

Determine if we need to start with an empty database for each of our test cases

There are the conclusions based on our review

Right now the test cases use a specific data set for each view. The data used for the tests is designed to be able to verify the different conditions of a view and determine if the view has been altered and can affect the results. To this end, the data must meet certain conditions regarding column relationships and values. According to the above:

  • The data extracted may come from different sample databases, according to the version used by each developer, or it may have been included to model a specific case. Therefore, we are not working with a single data set .
  • The advantage of working separate data sets is the independence of the test cases, they do not depend on the data from other scenarios to execute the tests. It also does not depend on or is affected by the data that another developer uses in test cases, allowing you to work in parallel.
  • For testing purposes, some data is modified to create additional test scenarios. For example, for two different views, a student can have different data, such as null columns or with different data that can affect the tests.
  • Having a single set of test data requires more design effort so that the global data fits all test cases. This can be complex because, for example, the descriptors can vary according to the data in each sample database.
  • I do not recommend using a single or incremental data set. Designing the tests for each of the views can be a complex process, and designing the data considering the effects in other test scenarios would go against the unit test concept. In addition, it can increase the complexity both when designing data and when several developers are working on test scenarios and need to add data, independence and parallelism would be lost.

Branches

We generated 2 branches just for general review. The code here is the code I moved from the initial spike (spike-postgres-tests), including the changes I made in order to accomplish the goal of the spike (

Jira Legacy
serverEd-Fi Issue Tracker
serverIde04b01cb-fd08-30cd-a7d6-c8f664ef7691
keyBIA-761
).

...