I have sat down for four weekends in a row to come up with a solution for two problem I encountered in the past:
- I wanted to save all images from a beautiful Teams channel message. Teams is saving these inline images not to SharePoint. The only way to download is to click on each image.
- We are doing a Tenant to Tenant migration at glueckkanja-gab after our merger. Most of the migration tools will support the migration of Teams chat, but not all tools are available, and some implementations are lacking features for a more flexible approach.
I started my career as a developer and my heart is still thinking Visual Basic. Do not be afraid, I migrated my dev skills to C# a long time ago and my array starts at 0 not 1.
I really wanted to try a few things regarding Microsoft Graph, the Microsoft Graph SDK and Microsoft Teams. The Microsoft Teamwork part of the API has a solid starting point (if you look at the beta version). The API is getting very mature set of capabilities. I’m a huge fan of Azure Functions and I’ve done quite a few projects that are talking to the Microsoft Graph using Application Permissions. I’ve checked the documentation and if I wanted to go this route I would have to request special permissions from Microsoft to access the content without a real user. For now I decided to go with a console application and a Azure AD Device Code flow.
I have published the source code at GitHub. Maybe this will get your own solution a kickstart. Just a quick disclaimer: A lot of this stuff is first time code for me (DI in a console, Graph Auth provider, logging, …). I think at some points I over-engineered the solution and I got distracted from my real business problems ;)
https://github.com/marcoscheel/M365.TeamsBackup
In the following sections I will show you how I approached the problem, how the result of the backup looks, how to setup and how to run it for yourself.
Check your migration vendor of choice first!
I have only limited experience with the following tools. But these vendors are the big players, and you can easily get in touch with them to get a demo or further information.
It is ok if you stop here if your migration needs are satisfied by one of these vendors. If you are interested in my approach it might still be worth reading on ;)
My goal
Since the beginning of the first lockdown in April we at Glück & Kanja did a cooking event for a social online gathering. We are still doing it and we created a lot of content since the first meeting. The event is hosted on Microsoft Teams Channel meeting every Monday. Check out our blog post (german). The event created a ton of beautifully pictures but they were stuck inside a Teams thread without easy access like SharePoint (if you consider SharePoint an easy access method). The backup should save all the attachments to the file system. Sneak preview of the result:
For our migration to the new tenant, we are running a set of tools. The tooling we have at hand has some limits regarding channel message migration. We wanted to preserve that chat messages for a Team without polluting the new target team. In some cases, we also want to archive the Team to a central location (a single SharePoint site collection) for archiving purposes. For this case we would need to export the chat to some readable and searchable format preserving most of the content.
My goal is not to provide a ready to use backup solution. For some of your scenarios this could work. I wanted to also provide a code base to accommodate your more specific requirements. We don’t rely on extensive app integration in our teams, so handling the adaptive cards in a chat message is not really a part of the HTML generation solution. But the JSON from the Microsoft Graph will have all or some information included. If you need to handle adaptive card content this solution could be a great kickstart so you don’t have to write a lot of boilerplate code to get to the message attachment properties.
The solution will try to preserve as much information as needed from the Microsoft Graph and dump it to disk. From this data you can run multiple HTML conversions to address the need for the content representation. The HTML generation is no longer interacting with the Microsoft Graph and could be reparsed month after the original content was deleted (decommissioning of the old tenant).
My current focus is on the following data:
- Basic team metadata including members
- Channel metadata including members for private channels
- Messages and replies with author and dates
- Message body and inline pictures (hosted content)
My approach
I mentioned it a few times. I split the solution in two parts. The first part is talking to the Microsoft Graph and storing the response to a JSON file. For every Team I create a folder with the ID of the group. The response of the team request will be stored in the folder with the name “team.json”. For every channel I create a folder with the channel ID and the channel response will be stored as “channel.json”. For every message (entry point of a thread if replies are available) I create a folder with the message ID and the message response will be stored as “message.json”. For every reply to a message I create another file with the pattern “message.{messageid}.json”. Every message will be checked if inline content is available. This is called hosted content and I treat every item as an inline image represented as an PNG-file. Because the ID for a hosted content item is very long, I decided to use an MD5 hash of the ID to use in the filename. For the root message the file is named “hostedcontent.{hostedcontenidMD5}.png” and for replies “hostedcontent.{messageid}.{hostedcontenidMD5}.png”
Of cause all this needs to happen with some kind of authentication. As mentioned, I’m big fan of application permission because there is no user involved. But the current graph implementation only offers access to chat messages if you apply for an app with protected API access. Read more about this here. I tried to apply but 4 weeks later I still don’t have feedback for my request (single tenant app should be easy) and I implemented the access using delegate permission based on the device flow. Based on this approach I made this into a feature. The tool will backup (by default) all teams the account is a member of. This way you can add the account to a Team as an admin or even the owner of Team can do this to “request” chat backup.
The other console application is taking care of building the HTML from the JSON files. Because all the heavy lifting is already done in the first part, the generation of HTML is really fast. The application needs the source directory and the HTML template for the output. The HTML templates currently has some inline styles to make it easy to move around and keep the dependencies low. The application is creating a HTML file for each channel and optionally a HTML file for every thread. Based on the configuration the HTML file will contain all images inline or as a separate file. If you go with the inline images you get a very portable version of the backup, but also a big file if you have many images or a long chat history. The combination of inline images and every thread as an HTML file will give you a great choice out of the box. If you would like to customize the HTML look and feel have a peak at the template file. With a few tweaks in the CSS styles you can enhance readability and change it to your preferences. I hopefully selected some easy-to-understand selectors.
The code
Let’s have a quick look at the code and what it takes to get it compiled, if you pull it from GitHub. I am doing all my development in Visual Studio “proper”. The solution was created with Visual Studio 2019 Enterprise (Preview) but the community non preview version should also be fine. I write most of my code in C# and so is this code. As .NET 5 is now available this is my first solution using this version, but I think you can get it running on .NET Core 3.1 if you downgrade the solution and packages. The final solution is hosted on GitHub and you are welcome to open an issue or create a pull request. Just have a little bit of patience as this is just a side hustle for me.
Normally I write Azure Functions and dependency injection is not yet in my DNA. If you look at the code this might feel a little awkward. I spend way too much time to get the console app make use of the DI concepts. For a start it feels OK and I had a lot of fun learning to code this, but I’m not sure this is 100% correct. Let’s put it this way: It works! Also, a thing I love about Azure Functions is the native Microsoft Extension Logging integration. So, it was natural to also rely on my normal code to write logs. In the cloud logging to a file on disk is not really a thing and that might be the reason why Microsoft does not have an out of the box solution for that. That’s my excuse why all of my logs are only on the console for the moment. I am looking into NLog or Serilog. The biggest benefit is to configure the logs levels very easy. Check out the application.json for a sample. The configuration system is also based on Microsoft standards. The console is loading the application.json settings and during development an argument can be passed in to respected the correct JSON file. A Special thanks to David Feldman for his blog post which got me started on the console DI thing.
As of now we didn’t write any business relevant code (developer love to write non relevant code 😁) so let’s get our hands dirty. Getting the data from Microsoft Teams is done via the Microsoft Graph. I’m using the beta SDK because for the Teams workload some feature are only available in the non-production endpoint. Also, I like to play with fire. The authentication (as mentioned: I had to go with the device code) is provided by the MSAL libraries. For my “daemon apps” (Azure Functions) I rely on the pure MSAL implementation, but for this application I tried something new and used the Microsoft Graph Auth libraries. The NuGet is still in preview, but this was the easiest solution to get the device code up and running in minutes. I plan to use this library in the future for my other projects. One thing was missing and was really annoying: I had to authenticate on every debug run, so I copied some code to persist the token to a file. This code is also standard Microsoft code, and it will put the token in a locally protected file. For the generation of the HTML file I visited an old friend: HTML Agility Pack (HAP). This is an awesome library and working with the HTML DOM is a breeze! The HTML from the Teams chat message can contain images (hosted content) pointing to the Microsoft Graph endpoint. Using the HTML Agility Pack, I search the images and replace the src with as base64 encoded version or a local file reference.
The result
Here is a side-by-side comparison. The original channel in Microsoft Teams and the HTML backup generated from the Microsoft Graph API.
The HTML file will contain the Team name, creation date and the member count. The Channel name is also part of every file and if it is a private channel also the member count. Every post will contain the creation (+ edit if available) time, the author and the body including images. Links to documents are not modified and the file will not be downloaded. The content of an adaptive card is available in the JSON, but currently the output is not rendered in HTML. The adaptive cards rendering will be added later.
After the complete run you have a set of JSON files representing Microsoft Graph SDK classes. I recommend putting them in to a ZIP file (there are a lot of files) and place them next to the HTML output. Based on your data and configuration the output can be stored in a SharePoint library or on a file based archive.
Your setup
Currently the setup is not download and ready to run. First I don’t want to offer a multi-tenant application in my Azure AD. Therefor you need to talk to your admin to register the application in the first place. Based on the permission the app needs the admin consent in any way. Please check out the project Readme.md to get the needed permissions and put the needed details in the application settings file. The binaries can be downloaded from the release page of the GitHub project. You will need to have the .NET 5 runtime installed.
Summary
I had a blast writing the solution from start to end. I will tweak the solution in the near future. These are the next ideas I want to code:
- Save images with the date in the name and add EXIF data with the author information
- Allow the use of a non-interactive AAD login (confidential client)
- Integrate features in an Azure Function and maybe trigger from a Teams Message Extension for an ad-hoc export for every user
I created the solution for a very specific use case (our tenant migration), but I hope in making the source code available you will be able to solve your problems. If you have feedback, please let me know. Create an issue, hit me up on Twitter or LinkedIn.